Justin Uberti
Justin Uberti
Still hitting these issues with v0.3: ``` # just infer --text_only --prompt hi -q 8 -m fixie-ai/ultravox-v0_3 poetry run python -m ultravox.tools.infer_tool --text_only --prompt hi -q 8 -m fixie-ai/ultravox-v0_3 config.json:...
Notes from @farzadab: > The error "weight is on the meta device" means that you're likely running out of memory when doing the quantization somehow. The "meta" device is an...
Deprioritized for the time being.
The goal here would just be to allow the model to see interleaved text during stage 1 training, which should help it learn text-audio invariance. So once we have the...
Interesting. Do you have a sample output dataset I could take a look at?
Hmm, I listened to a few clips and I wonder if the merging is the right way to do this. The audio clips tend to be fairly different with their...
OK, I can get behind that. I still think this warrants further investigation though: ``` eval/covost2-asr-es_en.2k-asr:0.14595425715933116 eval/covost2_long_audio-asr-combine-5-es_en.2k-asr:0.1345223909283106 eval/covost2_long_audio-asr-combine-10-es_en.2k-asr:0.20847972323659428 ``` It seems odd that combining 5x is much better than combining...
Hmm, that's kind of surprising. Perhaps the encoder hidden states end up the same when repeated audio is used?
Hi @zqhuang211, there are a lot of interesting ideas in this PR but it has far too much surface area. The key focus here should be on the data/ changes...
The application can monitor loss and apply its own FEC, which in many ways allows for app-specific customization.