lhotse icon indicating copy to clipboard operation
lhotse copied to clipboard

Sometimes batches are created which do not have same number of supervisions and inputs

Open RuABraun opened this issue 1 year ago • 2 comments

Traceback (most recent call last):
  File "train.py", line 1019, in <module>
    main()
  File "train.py", line 1012, in main
    run(rank=0, world_size=1, args=args)
  File "train.py", line 867, in run
    scan_pessimistic_batches_for_oom(
  File "train.py", line 977, in scan_pessimistic_batches_for_oom
    loss, _ = compute_loss(
  File "train.py", line 542, in compute_loss
    simple_loss, pruned_loss = model(
  File "/home/rudolf/miniconda3/envs/k2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/ssd1/team/tmp/rudolf/icefall/ch/model.py", line 117, in forward
    assert x.size(0) == x_lens.size(0) == y.dim0
AssertionError

I did do trim_to_supervisions(). I'm trying to track this down.

On the side, I'm surprised by the what I find unusual design of data loading (not using batch_sampler argument of dataloader, sampler instead of dataset containing data, collation happening in dataset.getitem etc.).

edit: think it's because I forgot --discard-overlapping

edit2: still failing, some cuts now dont have supervisions

this is how I created the cuts manifest:

    recording_set, supervision_set, _ = load_kaldi_data_dir(kdata, 16000, num_jobs=4)
    cuts = CutSet.from_manifests(recordings=recording_set, supervisions=supervision_set)
    cuts = cuts.trim_to_supervisions(keep_overlapping=False)
    cuts = cuts.truncate(offset_type='start', max_duration=60.0, keep_excessive_supervisions=False)
    cuts = cuts.compute_and_store_features(Fbank(), storage_path=outf + '-feats', num_jobs=6, storage_type=LilcomChunkyWriter)
    cuts.to_file(outf + '.jsonl.gz')

edit3: it has to do with the duration of the recording being shorter than the duration of a supervision, presumably somehow because of using sox bla.mp3 .. |

edit4: Okay so for some reason the sox resampling causes the file to be slightly shorter. This means the interval gotten from the [kaldi] segments is too long, so later when trimming it gets dropped. Not sure what the right fix for this, my quickfix is making the supervision duration not exceed the recording, if the difference is bigger then 0.1 then an error is thrown.

RuABraun avatar Jul 25 '22 10:07 RuABraun

Regarding the design, some motivation is provided here: https://lhotse.readthedocs.io/en/latest/datasets.html#about-lhotses-datasets-and-samplers

As for Kaldi imports, segments file, Sox, and MP3, I’ve found in the past that the duration information is quite unreliable. Usually some manual/data-specific step is required to fix it. If you have any suggestions how we can improve this experience, it will be welcome.

pzelasko avatar Jul 26 '22 19:07 pzelasko

Cheers for the link, that helps me understand the motivation!

Okay give me a bit to think about it and I'll get back to you.

RuABraun avatar Jul 27 '22 12:07 RuABraun

So I did a hacky fix for this, after the supervision set is created I have a loop that adjusts the supervision duration and deletes if necessary:

to_del = []
for sup in supervision_set:
    recording_id = sup.recording_id
    if durations[recording_id] < sup.duration:
        if durations[recording_id] + 0.1 > sup.duration:
            sup.duration = durations[recording_id]
        else:
            print(f'{sup.id} {recording_id} {sup.duration} {durations[recording_id]} duration mismatch!')
            to_del.append((sup.id, recording_id,))
for sup_id, recording_id in to_del:
    del supervision_set.segments[sup.id]
    del recording_set.recordings[recording_id]

Not the nicest a solution, but it works. What do you think?

RuABraun avatar Sep 17 '22 11:09 RuABraun

Hmm I just remembered, you can also try setting a larger default duration tolerance:

https://github.com/lhotse-speech/lhotse/blob/4a0436098291d9aa5243a053199738c1908ae90b/lhotse/audio.py#L73-L97

Would this help? I could expose it via environment variable to make it easier to tweak without changing the code for Lhotse CLIs.

pzelasko avatar Sep 20 '22 01:09 pzelasko

I will try that out!

RuABraun avatar Sep 20 '22 10:09 RuABraun