Piotr Żelasko comments

Results 523 comments of


                                            Piotr Żelasko

Loading sampler state dict error

You can do: - `print(sampler.state_dict())` - `print(torch.load("sampler_checkpoint.pt"))` Also it might be helpful to see the output of `print(sampler.get_report())` or `print(sampler.diagnostics.get_report(per_epoch=True))` for even more detail.

Loading sampler state dict error

Sorry for a late answer. I wonder why it says `batches kept 97601/97601` when the checkpoint is from 2000 steps? The sampler in the main process could have processed at...

Loading sampler state dict error

I find it weird that your kept_batches is 59270, but the checkpoint is written as step 16000. The ratio is 3.7, it suggests that maybe you used 4 GPUs to...

Loading sampler state dict error

> No, I resumed training with the exact same hardware machine, same dataset, and same max-duration, using only 1 GPU. Yeah indeed it was weird, since at batch 1, my...

Loading sampler state dict error

Okay, interesting. It looks like during the iteration, the `kept_batches` count is being properly incremented (the diff between 4k and 8k steps is 4k), but the initial offset is suspiciously...

Loading sampler state dict error

I think I finally realized what is the issue... please try again with this PR https://github.com/lhotse-speech/lhotse/pull/854 You will need to start a new training for the fix to kick in,...

Loading sampler state dict error

No problem - it passes all the tests so I will merge it, hopefully it will resolve the issues of others as well.

Features storage key is different from cut ID

Yes, multiple cuts may refer to the same `Features` manifest, think of 30min recording for which you precomputed 30min worth of feature frames, and cuts refer to subsets of those....

Idea about max_duration scaling factor for bucketing

I'm afraid we'd have to re-train it for every new dataset, task, and model architecture, until we accumulate enough meta-data about OOM across various experiments that we can start to...

Idea about max_duration scaling factor for bucketing

> In some setups there is the opposite issue: that due to fixed-size memory overheads per sequence, minibatches of very short utterances tend to blow up the training. This can...