cog-whisper Significant latency regression in latest release

Hi @chenxwh and replicate,

The latest available version seems to have a significant latency regression from the version that I have been using for some time now. Trying the same large-v1 model (New) and large model (Old) on what I believe is a warm model seems to have drastically different performance characteristics.

From the replicate runs page, New is 10x slower than old on equivalent data

New	ID	Model	Source	Status	Run Time	Created
New	imiwp7wkk…	openai/whisper	API	Succeeded	57.6 seconds	a minute ago
New	hhj3ijrde…	openai/whisper	API	Succeeded	44.4 seconds	2 minutes ago
Old	fdhdfyvmf…	openai/whisper	API	Succeeded	3.0 seconds	6 minutes ago

In my metrics you can see a latency shift over also in the ~10x range New: Old:

New version sha: 23241e5731b44fcb5de68da8ebddae1ad97c5094d24f94ccb11f7c1d33d661e2 Old version sha: b6e7ea7aef18444c29d974fee51ffc1e47e1699cfaf4e5cde0ba47a8db74f3b6

Looking deeper, i decided to "bisect" versions with the following test

Warm up the model with one request
When the warm up request returns, send another request and use that as a measure of performance
Mark as bad if transcription time is >30s, otherwise mark good Bad: 23241e5731b44fcb5de68da8ebddae1ad97c5094d24f94ccb11f7c1d33d661e2 Good:089ea17a12d0b9fc2f81d620cc6e686de7a156007830789bf186392728ac25e8 Good: 30414ee7c4fffc37e260fcab7842b5be470b9b840f2b608f5baa9bbef9a259ed

So really looks like the latest change added a regression. Going to revert my versioning away from the latest, but thought I would let the team know.

Dec 13 '22 04:12 philkuz

I scanned the code and don't see anything obvious. Could this be something in some changed in a new cog release or something?

Dec 13 '22 04:12 philkuz

I've noticed a significant slow-down too.

Dec 14 '22 10:12 maccman

Hey y'all. Thanks for reporting this issue and sharing your analysis. I've added this to our internal board to discuss when the team gets back from the holiday break next week.

Dec 28 '22 20:12 zeke

@zeke this get prioritized? :)

Jan 13 '23 13:01 maccman

Not sure. Let me check with the team!

Jan 13 '23 20:01 zeke

Sounds like @andreasjansson was planning to look into this. I'll defer to him.

Jan 13 '23 20:01 zeke

Also @daanelson and @evilstreak :)

Jan 14 '23 05:01 zeke

hey! just a quick heads up for those interested that we're working on this. Think we have a fix to get large-v2 out without a regression, needs some testing to confirm. will keep you posted.

Jan 21 '23 00:01 daanelson

hey! just a quick heads up for those interested that we're working on this. Think we have a fix to get large-v2 out without a regression, needs some testing to confirm. will keep you posted.

Sweet!!

Jan 21 '23 21:01 maccman

@daanelson this out yet?

Feb 01 '23 12:02 maccman

@maccman not yet, unfortunately. Should have time to dig in some more next week.

Feb 10 '23 19:02 daanelson

@daanelson how about now? :)

Feb 22 '23 03:02 maccman

@maccman I've set up a whisper version which hosts only large-v2 here: https://replicate.com/daanelson/whisper-sandbox

Feel free to give it a spin and let me know how it goes; haven't seen any latency spikes in testing so far.

Feb 22 '23 05:02 daanelson

cog-whisper cog-whisper copied to clipboard

Significant latency regression in latest release

cog-whisper
cog-whisper copied to clipboard