Piotr Żelasko
Piotr Żelasko
Follow up to #863 and #1309 This version seems to work as intended, it consistently picks the same buckets on each DDP rank. It depends on good `duration_bins` initialization (i.e....
# What does this PR do ? This PR is for tracking the changes in speech-llm main development branch w.r.t. main. **Collection**: multimodal # Changelog - Add specific line by...
# What does this PR do ? Add a one line overview of what this PR aims to accomplish. **Collection**: [Note which collection this PR will affect] # Changelog -...
Lhotse CI is breaking on lilcom installation, not 100% sure why, but I think it is related to numpy 2.0 release. First, Lhotse tests were failing on `numpy not available`,...
# What does this PR do ? This PR extends NeMo and SpeechLLM with the following: * EMMeTT (optimized training) support for SpeechLLM * Support for joint audio and text...
> [!IMPORTANT] > The `Update branch` button must only be pressed in very rare occassions. > An outdated branch is never blocking the merge of a PR. > Please reach...
I've noticed in recent CI builds of lhotse, kaldifeat has issues with compilation. I'm disabling kaldifeat-related tests for now, but if you know how to resolve this, I'll re-enable them....
# What does this PR do ? Adds tests and fixes inconsistency in ASR feature extractor when processing the same input with and without padding. Specifically: * removes a Dirac-delta-like...
# What does this PR do ? SpeechLM2 improvements: * SALM ASR eval: choose English/Basic/None text normalizer, remove hardcoded user prompt and make customizable * Qwen prompt formatter definition +...