snowfall
snowfall copied to clipboard
Naming problem RE sequence_idx
I just realized that there is a slight confusion in one of our naming conventions right now. In certain situations we are using sequence_idx to mean the index into the list of supervisions. But we also use it to mean the index into the feature sequences (as in supervisions['sequence_idx']) and these are not always identical due to batching-- particularly, I think, with more recent versions of lhotse. This is a potential cause of bugs.
E.g. this happens with our use of 'seqframe_idx' in intersectio code in k2, where the "seq" is implicitly the same as "sequence_idx", and yet it should really be, I think, "supervision_idx" if we are to avoid this source of confusion. I am thinking of making that change; I don't believe we are yet making use of this in Snowfall. [But on the other hand, since k2 itself has no notion of "supervison", it's not clear that the name "supframe_idx" would make sense.] Not quite sure what to do. Fangjun, make sure this wasn't an issue in your experiments. Right now some of the values of supervisions['sequence_idx'] are like this: tensor([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 16, 17, 18, 19, 20, 21, 22, 23, 23, 24, 24, 25, 25, 26, 26, 27, 28, 29, 30, 31, 31, 32, 33])
Not quite sure what to do. Fangjun, make sure this wasn't an issue in your experiments.
It is not a problem for me. I am using path_idx when a sequence contains multiple paths (i.e., sub-sequences).
FWIW I believe we were using the "cut concatenation" mechanism that packs multiple cuts+supervisions in a single "sequence" for several months now. IF there was an issue, I think all the numbers anybody has reported would have been affected.
If K2 has no notion of supervision, then sub-sequence seems the next best name to me too (but subseqframe_idx looks weird? supframe_idx seems more easy to comprehend at least for me...)