The enhanced model is based on the standard model, but does LLM-based post processing to improve the accuracy of the transcription, get more detailed diarization segments, and infer labels or names of speakers.

The enhanced model can work well for languages that are lower in the accuracy of the standard model.

Limitations compared to the standard model:

  • The enhanced model is slower.
  • Unlike the standard model, the enhanced model does not output word level timestamps and confidence scores.
  • Max input audio input duration is currently 60 minutes (this will be increased in the near future).

Costs are $0.0005 per second of audio equivalent to $1.80 per hour of audio.

Warning: The enhanced model is experimental and may fail or produce incorrect results (you won’t be charged on transcribe failure). Feel free to try it out and let us know what you think (mail to [email protected] or on X @thomas_mol).

Example

Request Example
  curl  --request POST \
    --url https://api.spectropic.ai/v1/transcribe \
    --header: 'Authorization: Bearer <apikey>' \
    --header: 'Content-Type: application/json' \
    --data '{
      "url": "https://example.com/file.mp3",
      "model": "enhanced",
      "numSpeakers": 2,
      "language": "en",
      "vocabulary": "Spectropic, AI, LLama, Mistral, Whisper."
    }'