OpenAI /

Whisper V3

accounts/fireworks/models/whisper-v3

ServerlessAudio

Whisper is a state-of-the-art model for automatic speech recognition (ASR) and speech translation, proposed in the paper Robust Speech Recognition via Large-Scale Weak Supervision by Alec Radford et al. from OpenAI. Trained on >5M hours of labeled data, Whisper demonstrates a strong ability to generalise to many datasets and domains in a zero-shot setting.

Serverless API

Whisper V3 is available via Fireworks' Speech-to-Text APIs, where you are billed based on the duration of the transcribed audio. The API supports multiple languages and additional features, including forced alignment.

You can call the Fireworks Speech-to-Text API using HTTP requests from any language. You can see the API references here:

Try it

API Examples

Generate a model response using the speech-transcription endpoint of whisper-v3. API reference

import requests

with open("audio.mp3", "rb") as f:
    response = requests.post(
        "https://audio-prod.us-virginia-1.direct.fireworks.ai/v1/audio/transcriptions",
        headers={"Authorization": f"Bearer <YOUR_API_KEY>"},
        files={"file": f},
        data={
            "model": "accounts/fireworks/models/whisper-v3",
            "temperature": "0",
            "vad_model": "silero"
        },
    )

if response.status_code == 200:
    print(response.json())
else:
    print(f"Error: {response.status_code}", response.text)