- The enhanced model is slower.
- Unlike the standard model, the enhanced model does not output word level timestamps and confidence scores.
- Max input audio input duration is currently 60 minutes (this will be increased in the near future).
$0.0005
per second of audio equivalent to $1.80
per hour of audio.
Warning: The enhanced model is experimental and may fail or produce incorrect results (you won’t be charged on transcribe failure). Feel free to try it out and let us know what you think (mail to [email protected] or on X @thomas_mol).
Example
Request Example