Max Bain
Max Bain
Install with video2dataset, it's much faster https://github.com/iejMac/video2dataset
I see, what are the huggingface whisper outputs? If it outputs a list of dictionaries with "text", "start", and "end", then it can just feed into whisperx.align. see ```python import...
v3 is using faster-whisper backend, which can use finetuned whisper weights https://github.com/guillaumekln/faster-whisper/blob/d889345e071de21a83bdae60ba4b07110cfd0696/README.md?plain=1#L142 feel free to add pull request to add this functionality, would require sending custom model_path
Same here :(
Quality is much better without conditioning on previous text https://github.com/openai/whisper/discussions/679#discussioncomment-4449150 Similarly whisperx requires this because theres just too much hallucination otherwise >just remove all the input_ids if asked Yes trying...
Hacked attempt here, seems to work on my end -- can now run very fast whisper without hallucination :') https://github.com/huggingface/transformers/pull/21491/commits/cf2ad49fae43e8355655c5392d4dca0bdd1a733e
@ArthurZucker @sanchit-gandhi Thanks for the review and comments sanchit! Unfortuntately I am pushing for ICCV/interspeech deadline for the next couple weeks so I dont have time at the moment to...
Hi, sure you can see some runs for MSRVTT here: https://app.neptune.ai/m-bain/frozen/experiments?split=tbl&dash=charts&viewId=95e7e8f0-79f1-48a4-9bd5-e1017c21309b Yeah smaller batch size will take longer to converge -- and intuitively I would think it gives worse performance...
Need a wav2vec2.0 model finetuned on chinese. Seems there are some on huggingface https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-chinese-zh-cn Currently whisperX doesn't support huggingface pipelines though, when I have the bandwidth I can try add...
Please see recent commit https://github.com/m-bain/whisperX/commit/e909f2f766b23b2000f2d95df41f9b844ac53e49 And in large thanks to @yasutak 🚀 I tried adding this chinese model but it seems not able to align properly. It's hard for me...