cog-whisper icon indicating copy to clipboard operation
cog-whisper copied to clipboard

Significant latency regression in latest release

Open philkuz opened this issue 2 years ago • 13 comments

Hi @chenxwh and replicate,

The latest available version seems to have a significant latency regression from the version that I have been using for some time now. Trying the same large-v1 model (New) and large model (Old) on what I believe is a warm model seems to have drastically different performance characteristics.

From the replicate runs page, New is 10x slower than old on equivalent data

New ID Model Source Status Run Time Created
New imiwp7wkk… openai/whisper API Succeeded 57.6 seconds a minute ago
New hhj3ijrde… openai/whisper API Succeeded 44.4 seconds 2 minutes ago
Old fdhdfyvmf… openai/whisper API Succeeded 3.0 seconds 6 minutes ago

In my metrics you can see a latency shift over also in the ~10x range New: image Old: image

New version sha: 23241e5731b44fcb5de68da8ebddae1ad97c5094d24f94ccb11f7c1d33d661e2 Old version sha: b6e7ea7aef18444c29d974fee51ffc1e47e1699cfaf4e5cde0ba47a8db74f3b6

Looking deeper, i decided to "bisect" versions with the following test

  1. Warm up the model with one request
  2. When the warm up request returns, send another request and use that as a measure of performance
  3. Mark as bad if transcription time is >30s, otherwise mark good Bad: 23241e5731b44fcb5de68da8ebddae1ad97c5094d24f94ccb11f7c1d33d661e2 Good:089ea17a12d0b9fc2f81d620cc6e686de7a156007830789bf186392728ac25e8 Good: 30414ee7c4fffc37e260fcab7842b5be470b9b840f2b608f5baa9bbef9a259ed

So really looks like the latest change added a regression. Going to revert my versioning away from the latest, but thought I would let the team know.

philkuz avatar Dec 13 '22 04:12 philkuz

I scanned the code and don't see anything obvious. Could this be something in some changed in a new cog release or something?

philkuz avatar Dec 13 '22 04:12 philkuz

I've noticed a significant slow-down too.

maccman avatar Dec 14 '22 10:12 maccman

Hey y'all. Thanks for reporting this issue and sharing your analysis. I've added this to our internal board to discuss when the team gets back from the holiday break next week.

zeke avatar Dec 28 '22 20:12 zeke

@zeke this get prioritized? :)

maccman avatar Jan 13 '23 13:01 maccman

Not sure. Let me check with the team!

zeke avatar Jan 13 '23 20:01 zeke

Sounds like @andreasjansson was planning to look into this. I'll defer to him.

zeke avatar Jan 13 '23 20:01 zeke

Also @daanelson and @evilstreak :)

zeke avatar Jan 14 '23 05:01 zeke

hey! just a quick heads up for those interested that we're working on this. Think we have a fix to get large-v2 out without a regression, needs some testing to confirm. will keep you posted.

daanelson avatar Jan 21 '23 00:01 daanelson

hey! just a quick heads up for those interested that we're working on this. Think we have a fix to get large-v2 out without a regression, needs some testing to confirm. will keep you posted.

Sweet!!

maccman avatar Jan 21 '23 21:01 maccman

@daanelson this out yet?

maccman avatar Feb 01 '23 12:02 maccman

@maccman not yet, unfortunately. Should have time to dig in some more next week.

daanelson avatar Feb 10 '23 19:02 daanelson

@daanelson how about now? :)

maccman avatar Feb 22 '23 03:02 maccman

@maccman I've set up a whisper version which hosts only large-v2 here: https://replicate.com/daanelson/whisper-sandbox

Feel free to give it a spin and let me know how it goes; haven't seen any latency spikes in testing so far.

daanelson avatar Feb 22 '23 05:02 daanelson