Piotr Żelasko
Piotr Żelasko
> Something else: the internal code of ActivationBalancer has a few elementwise expressions that don't need gradient, and which could be speeded up by writing a kernel. Does anyone know...
> That's interesting. > Have you used CuPy? I'm wondering how easy the installation process is. > (Things that have a dependency on the CUDA toolkit tend to be tricky...
I'd be more concerned that your CTC losses go to infinity after the first batch, I'd double check your data/transcripts/lexicon/etc..
I don't recall anything related to this being fixed; to me it looks like one of your cuts has two supervisions (or you used `CutConcatenate` transform). In these cases it's...
... as a side note, maybe it makes sense to add a separate script in each recipe that validates the major assumptions about the data (single supervision per cut, supervisions...
Very cool! You can find data preparation recipes for both LibriTTS and LJSpeech (and many more) in Lhotse. https://github.com/lhotse-speech/lhotse/tree/master/lhotse/recipes
Can you try increasing the number of dataloader workers? Perhaps that’s the bottleneck. If you want to use fbank as a layer you can modify the code to use https://github.com/lhotse-speech/lhotse/blob/eb9e6b115729697c66c0a7f5f7ba08984b6a1ee5/lhotse/features/kaldi/layers.py#L476...
You are getting the best gains by increasing dataloader workers so it’s likely an IO bottleneck, using webdataset or Lhotse Shar may help. BTW the fbank I posted is also...
Can you reduce the number of workers (especially for on the fly features) and see if it helps?
AFAIK there's no recipe at this time, but it shouldn't be too involved: - export your train cutset to Lhotse Shar format, e.g. with `lhotse shar export --help` - adjust...