epicenter
epicenter copied to clipboard
[FR] Add support for new OpenAI STT models
Hi, OpenAI released new STT models a week ago and it would be nice if they could be used, instead of only the whisper-1 option.
i second this. it would be nice to have the SOTA models.
I am not sure "SOTA" necessarily means "best for me". I suppose this is very much a "YMMV" type of thing. I just tested gpt-4o-transcribe (couldn't be bothered with gpt-4o-mini-transcribe) from the AHK cURL app I previously linked.
I came away thoroughly disappointed. I think gpt-4o transcribes faster (not by a huge margin ... likely 4o-mini would be even more so). But 4o was subjectively no more accurate to me personally than the tried-and-true whisper-1.
I know OpenAI states otherwise: https://au.finance.yahoo.com/news/openai-upgrades-transcription-voice-generating-170000157.html
Adding insult to injury, the way 4o handles prompting is a mess, to me anyway. Should I take a course on "prompt engineering"?
And 4o seems to hallucinate more (mind you I might have been able to play with "temperature" and what not ... but I didn't feel like adjusting 16 more parameters ...)
Again OpenAI says 4o "hallucinates" less. Maybe my left brain is "gaslighting" my right?
Anyhoo, I don't disagree that @braden-w might wish to add this "SOTA" endpoint when he hopefully gets some spare time.
But pardon me for not feeling at all "SOTA" about 4o-transcribe.
P.S., I really do wish to emphasize that gpt-4o-transcribe seems to handle prompting inconsistently, based on my experience using it this evening. May the rest of you have better luck with "SOTA".
And if any of you figures out a way to get 4o to "obey" prompting consistently, do please share your tips!
Honestly, same. Whisper seems to catch words better than the new ones.
Hey everyone, thanks for the suggestion! I finally returned to development, and this is now included in the v7 release.
Please reopen a new issue if you have any more requests! Thank you again.