Alpesh Nakrani

Integrating Whisper API for voice features in Laravel

By Alpesh Nakrani

Step-by-step guide to integrating OpenAI Whisper API for voice features in Laravel. Real code, real results, and what tripped us up first.

Integrating Whisper API for voice features in Laravel

Voice features in web apps used to require a dedicated audio engineering team, expensive third-party SDKs, and weeks of integration work. OpenAI’s Whisper API changed that.

I integrated Whisper API into a Laravel application for a client at Devlyn.ai in about two days. Not prototype-level integration. Production integration, with queued jobs, error handling, cost controls, and a Filament admin panel for reviewing transcriptions. This article is the guide I wish I’d had when I started.

Integrating Whisper API for voice features in Laravel is genuinely achievable in a single sprint if you know what you’re building toward and where the edge cases hide.

Here’s exactly how we did it.


What Whisper API does (and what it doesn’t)

Whisper is OpenAI’s speech-to-text model. You send it an audio file; it returns a text transcription. That’s the core. It supports 57 languages, handles accents well, and outperforms most commercial alternatives on accuracy for technical vocabulary.

According to OpenAI’s Whisper documentation, the model accepts audio files up to 25MB in formats including mp3, mp4, wav, m4a, and webm. Transcription is synchronous. You send a file, you wait, you get text back, typically in 2 to 10 seconds depending on file length.

What it doesn’t do: real-time streaming transcription. Whisper is a batch model. If you need live, word-by-word transcription as someone speaks (think live captions), Whisper is not the right tool. For batch transcription of recorded audio, meeting notes, voice memos, user-uploaded content, or voice commands on form fields, it’s excellent.

The feature we were building: a Laravel SaaS where users could record voice notes that were automatically transcribed, tagged, and searchable. Classic Whisper use case.


Setting up the OpenAI PHP client in Laravel

Start with the OpenAI PHP client. This is the official community-maintained package and the one I’d recommend over rolling your own API wrapper.

composer require openai-php/client

Add your OpenAI API key to .env:

OPENAI_API_KEY=sk-your-key-here
OPENAI_ORG_ID=org-your-org-here

Publish and configure the package:

php artisan vendor:publish --provider="OpenAI\Laravel\ServiceProvider"

This creates config/openai.php. The defaults are fine. The key things it configures:

  • API key and org ID from your .env
  • HTTP client timeout (default 30 seconds, you’ll want to raise this for long audio files)
  • Retry settings

Now you can use the OpenAI facade anywhere in your Laravel app:

use OpenAI\Laravel\Facades\OpenAI;

$response = OpenAI::audio()->transcribe([
 'model' => 'whisper-1',
 'file' => fopen('/path/to/audio.mp3', 'r'),
 'response_format' => 'json',
 ]);

echo $response->text; // Your transcription

That’s the basic integration. In production, you don’t call this inline. You queue it.


Building the queued transcription job

Transcription takes seconds. Web requests shouldn’t wait for it. We put every Whisper API call inside a Laravel queued job.

First, the model. We created a Transcription model to store audio file references and transcription results:

php artisan make:model Transcription -m

Migration:

Schema::create('transcriptions', function (Blueprint $table) {
 $table->id();
 $table->foreignId('user_id')->constrained()->cascadeOnDelete();
 $table->string('audio_path');
 $table->text('transcript')->nullable();
 $table->string('language', 10)->nullable();
 $table->float('duration')->nullable();
 $table->enum('status', ['pending', 'processing', 'completed', 'failed'])->default('pending');
 $table->text('error_message')->nullable();
 $table->timestamps();
 });

Then the job:

php artisan make:job TranscribeAudio
<?php

namespace App\Jobs;

use App\Models\Transcription;
use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Queue\SerializesModels;
use OpenAI\Laravel\Facades\OpenAI;

class TranscribeAudio implements ShouldQueue
{
 use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;

 public int $tries = 3;
 public int $timeout = 120;

 public function __construct(
 public readonly Transcription $transcription
 ) {}

 public function handle(): void
 {
 $this->transcription->update(['status' => 'processing']);

 try {
 $response = OpenAI::audio()->transcribe([
 'model' => 'whisper-1',
 'file' => fopen(storage_path('app/'. $this->transcription->audio_path), 'r'),
 'response_format' => 'verbose_json',
 'language' => 'en',
 ]);

 $this->transcription->update([
 'transcript' => $response->text,
 'language' => $response->language,
 'duration' => $response->duration,
 'status' => 'completed',
 ]);

 } catch (\Exception $e) {
 $this->transcription->update([
 'status' => 'failed',
 'error_message' => $e->getMessage(),
 ]);

 throw $e; // Re-throw so Laravel retries it
 }
 }
}

A few things worth noting in this job:

  • $tries = 3 means Laravel retries on failure up to three times. OpenAI API calls occasionally time out or return 429 rate limit errors.
  • $timeout = 120 gives the job two minutes. A 25MB audio file can take 30 to 60 seconds to transcribe. The default job timeout would kill it.
  • verbose_json as the response format gives you language detection and duration as well as the transcript, useful for logging and the admin UI.

Handling audio uploads from the browser

The controller that receives the uploaded file and dispatches the job:

<?php

namespace App\Http\Controllers;

use App\Jobs\TranscribeAudio;
use App\Models\Transcription;
use Illuminate\Http\Request;
use Illuminate\Support\Facades\Storage;

class TranscriptionController extends Controller
{
 public function store(Request $request)
 {
 $request->validate([
 'audio' => ['required', 'file', 'mimes:mp3,mp4,wav,m4a,webm', 'max:25600'],
 ]);

 $path = $request->file('audio')->store('audio', 'local');

 $transcription = Transcription::create([
 'user_id' => auth()->id(),
 'audio_path' => $path,
 'status' => 'pending',
 ]);

 TranscribeAudio::dispatch($transcription);

 return response()->json([
 'transcription_id' => $transcription->id,
 'status' => 'pending',
 ], 202);
 }
}

On the frontend, we used a simple Vue component with the MediaRecorder API to capture browser audio and POST it to this endpoint. The 202 response tells the client “we have it, we’re working on it.” The frontend polls a status endpoint every two seconds until the transcription status is “completed.”

Mini-story: The client who discovered the file size limit

Early in the integration, a client named Sarah was testing the feature by uploading an hour-long meeting recording. The upload succeeded. The job failed. No error message in the UI because we hadn’t handled the failure state gracefully yet.

The issue: her file was 38MB. Whisper’s API limit is 25MB. We should have validated file size on the controller. We hadn’t.

The fix took 10 minutes. The lesson took longer to internalize: validate against the API’s constraints at the upload boundary, not inside the job. By the time the job runs, the file is already in storage. Returning a useful error at upload time is far better than a cryptic job failure 30 seconds later.

We added the max:25600 validation (25MB in kilobytes) to the controller after that. We also added a check for audio duration using getID3 before dispatching the job, rejecting uploads over 30 minutes with a helpful error message.


Cost controls: what Whisper API actually costs

Whisper API pricing is $0.006 per minute of audio as of early 2026. That sounds cheap. It compounds fast at scale.

A user who records 10 voice notes per day at an average of five minutes each is generating 50 minutes of audio per day. At $0.006/min, that’s $0.30/day per heavy user, or $9/month. If you have 100 heavy users, you’re spending $900/month on Whisper alone.

We built two cost controls from the start:

Per-user monthly limit: Each user gets a configurable transcription budget (in minutes) per billing period. Stored in the database, decremented after each successful transcription, reset on billing cycle renewal.

Pre-transcription duration check: We check audio duration before dispatching the job. Anything over a configured maximum (we used 60 minutes) is rejected with a clear message. This prevents a single upload from running an unexpectedly large API bill.

// In the controller, after storing the file
$duration = $this->getAudioDuration(storage_path('app/'. $path));

if ($duration > config('transcription.max_duration_seconds', 3600)) {
 Storage::delete($path);
 return response()->json(['error' => 'Audio file exceeds maximum duration.'], 422);
}

$user = auth()->user();
$minutesUsed = $user->transcription_minutes_used_this_period;
$minutesAllowed = $user->plan->transcription_minutes_per_period;

if (($minutesUsed + ($duration / 60)) > $minutesAllowed) {
 Storage::delete($path);
 return response()->json(['error' => 'Monthly transcription limit reached.'], 422);
}

Building these controls in from day one saved us from a surprise bill. I’ve seen teams add cost controls after their first expensive month. Don’t do that.


Building the Filament admin panel for transcriptions

We used Laracopilot to generate the initial Filament resource for the Transcription model. It produced a usable admin panel with filters, search, and a detail view in minutes. We customized from there.

The admin resource we ended up with:

public static function table(Table $table): Table
{
 return $table
 ->columns([
 TextColumn::make('user.name')->label('User')->searchable(),
 BadgeColumn::make('status')
 ->colors([
 'warning' => 'pending',
 'info' => 'processing',
 'success' => 'completed',
 'danger' => 'failed',
 ]),
 TextColumn::make('duration')
 ->formatStateUsing(fn ($state) => gmdate('H:i:s', (int) $state))
 ->label('Duration'),
 TextColumn::make('created_at')->dateTime()->sortable(),
 ])
 ->filters([
 SelectFilter::make('status')
 ->options([
 'pending' => 'Pending',
 'processing' => 'Processing',
 'completed' => 'Completed',
 'failed' => 'Failed',
 ]),
 ]);
}

Having real-time visibility into transcription status, failures, and duration data was critical for debugging the integration during the first two weeks in production.


The edge cases that will trip you up

After running this in production for several months, here are the real problems that don’t appear in tutorials:

Background noise and transcription accuracy: Whisper handles clean audio well. Users recording on noisy connections or phone microphones get significantly lower accuracy. We added a UI indicator (“Better results in quiet environments”) and considered filtering by confidence score, which verbose_json provides.

Non-English audio: Whisper supports many languages, but accuracy varies by language and accent. We found it handled Indian English well, which matters for our user base. For mixed-language audio (code-switching), results were inconsistent. We now let users specify the language manually when accuracy matters.

File format edge cases: Some browsers produce WebM audio that has valid MIME type but unusual codec combinations. We added ffmpeg server-side conversion for uploaded WebM files before sending to Whisper. This added complexity but eliminated format-related failures.

Retry flooding: When the OpenAI API has a brief outage, all pending jobs fail simultaneously and retry at the same time. We added exponential backoff to the job retry logic to avoid hammering the API:

public function backoff(): array
{
 return [10, 30, 60]; // retry after 10s, 30s, 60s
}

Mini-story: The feature that changed how customers used the product

Three weeks after launching voice transcription, a customer named Daniel sent me a message. He was using the feature differently than we’d designed it.

Instead of recording meeting notes (our intended use case), he was using it to transcribe his own code review walkthroughs, recording himself walking through a codebase and narrating what he saw. He’d share the transcript with his team instead of writing documentation.

We hadn’t thought of that. It became the most-used feature pattern in the product. We built a “code walkthrough” mode specifically for that workflow, with prompt formatting that structured the transcript as a review document.

Whisper didn’t suggest that use case. Daniel did. The lesson: ship the integration fast, then watch how customers surprise you.


Connecting voice features to the broader Laravel AI stack

Voice transcription is one piece of a larger AI feature stack in Laravel. The pattern I’ve found most effective:

  1. Whisper for speech to text
  2. OpenAI GPT-4 (via the same openai-php/client) for summarization, classification, and structured extraction from the transcript
  3. Laravel Scout + Meilisearch for making transcripts searchable
  4. Filament for internal visibility and manual review

The build Laravel apps with AI guide on Laracopilot’s blog covers scaffolding AI-connected apps in more depth. If you’re building a voice feature alongside other AI capabilities, that’s a useful companion read.

For the full Devlyn.ai engineering blog covering AI integrations, see devlyn.ai/blog.


The bottom line

Integrating Whisper API for voice features in Laravel is approachable. The official PHP client works well. The queued job pattern is reliable. The Filament admin panel makes the integration observable.

The work that actually takes time: handling edge cases in production (file formats, retries, cost controls), designing the UX around async transcription, and watching how customers use what you’ve built differently than you expected.

Start with the basic integration. Ship it to a small set of users. Then build the edge case handling based on what you see in production, not what you anticipate at the drawing board.

If you’re building a Laravel SaaS with AI features and want to shortcut the scaffolding work, try Laracopilot free. It generates the model, migration, Filament resource, and queued job structure in minutes. The integration work you do on top of that is the interesting part.

Questions about the integration? I answer technical questions from builders in my weekly newsletter. Worth subscribing if you’re shipping Laravel AI features.


Alpesh Nakrani is VP of Growth at Devlyn.ai and Laracopilot. He writes about Laravel, AI tooling, and SaaS growth at alpeshnakrani.com.