cog-whisper
cog-whisper copied to clipboard
Significant latency regression in latest release
Hi @chenxwh and replicate,
The latest available version seems to have a significant latency regression from the version that I have been using for some time now. Trying the same large-v1 model (New) and large model (Old) on what I believe is a warm model seems to have drastically different performance characteristics.
From the replicate runs page, New is 10x slower than old on equivalent data
New | ID | Model | Source | Status | Run Time | Created |
---|---|---|---|---|---|---|
New | imiwp7wkk… | openai/whisper | API | Succeeded | 57.6 seconds | a minute ago |
New | hhj3ijrde… | openai/whisper | API | Succeeded | 44.4 seconds | 2 minutes ago |
Old | fdhdfyvmf… | openai/whisper | API | Succeeded | 3.0 seconds | 6 minutes ago |
In my metrics you can see a latency shift over also in the ~10x range
New:
Old:
New version sha: 23241e5731b44fcb5de68da8ebddae1ad97c5094d24f94ccb11f7c1d33d661e2
Old version sha: b6e7ea7aef18444c29d974fee51ffc1e47e1699cfaf4e5cde0ba47a8db74f3b6
Looking deeper, i decided to "bisect" versions with the following test
- Warm up the model with one request
- When the warm up request returns, send another request and use that as a measure of performance
- Mark as bad if transcription time is >30s, otherwise mark good
Bad:
23241e5731b44fcb5de68da8ebddae1ad97c5094d24f94ccb11f7c1d33d661e2
Good:089ea17a12d0b9fc2f81d620cc6e686de7a156007830789bf186392728ac25e8
Good:30414ee7c4fffc37e260fcab7842b5be470b9b840f2b608f5baa9bbef9a259ed
So really looks like the latest change added a regression. Going to revert my versioning away from the latest, but thought I would let the team know.
I scanned the code and don't see anything obvious. Could this be something in some changed in a new cog release or something?
I've noticed a significant slow-down too.
Hey y'all. Thanks for reporting this issue and sharing your analysis. I've added this to our internal board to discuss when the team gets back from the holiday break next week.
@zeke this get prioritized? :)
Not sure. Let me check with the team!
Sounds like @andreasjansson was planning to look into this. I'll defer to him.
Also @daanelson and @evilstreak :)
hey! just a quick heads up for those interested that we're working on this. Think we have a fix to get large-v2
out without a regression, needs some testing to confirm. will keep you posted.
hey! just a quick heads up for those interested that we're working on this. Think we have a fix to get
large-v2
out without a regression, needs some testing to confirm. will keep you posted.
Sweet!!
@daanelson this out yet?
@maccman not yet, unfortunately. Should have time to dig in some more next week.
@daanelson how about now? :)
@maccman I've set up a whisper version which hosts only large-v2
here: https://replicate.com/daanelson/whisper-sandbox
Feel free to give it a spin and let me know how it goes; haven't seen any latency spikes in testing so far.