POST
/
v1
/
transcribe

Creates a new transcribe job. Sends the finished transcript to the provided webhook URL.

This endpoint supports both a remote file from URL and a local file from multipart form data.

If you use the enhanced model the max audio input duration is currently 60 minutes (this will be increased in the near future).

Transcribe a Remote File (from url)

If you have a media file accessible via a URL, you can provide the URL to the file in the request body with the header Content-Type set to application/json.

Typically you would use this method if you have a file stored in a cloud storage service such as Amazon S3.

Use the url field in the body of the request to provide the URL to the file.

Make sure the URL to the file is publicly accessible, otherwise our endpoint cannot read the file

Transcribe a Local File

If you have a media file stored locally, you can provide the file in the request body with the header Content-Type set to multipart/form-data.

Use the file field in the body of the request to upload the file.

Typically you would use this method if you have a file stored on a local machine or device.

Receiving the transcript (Webhook)

The webhook URL is where the finished transcript will be sent. The transcript will be sent as a JSON object in the request body.

Make sure the webhook URL is publicly accessible, otherwise our endpoint cannot send the transcript

The request body of the webhook will be equal to the Transcript schema as specified on page Transcript Schema.

Job status and transcript output

You can view the status of the job by using the Retrieve Job endpoint.

The output of the job will be saved in the output field of the job object, and is deleted after 24 hours.

Authorizations

Authorization
string
headerrequired

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

url
string
required

Direct URL of an audio or video file. Max 2GB. Make sure to use a publicly accessible URL

model
string

Model used for the transcript. 'standard' or 'enhanced', default is 'standard'.

numSpeakers
integer

Number of speakers in the transcript. If null, the number of speakers will be detected automatically

language
string

Language of the transcript. If null, the language will be detected automatically. In form of ISO 639-1 code, like "en" or "de"

vocabulary
string

Vocabulary used for the transcript. Similar to the "initial_prompt" parameter of Whisper. Provide acronyms, names and foreign words

webhook
string

URL to which the transcript will be sent once it is ready. Make sure to use a publicly accessible URL

Response

200 - application/json
jobId
string
message
string
status
string