returnn
returnn copied to clipboard
DistributeFilesDataset with sharding, num seqs seems incorrect
See my recently added test_DistributeFilesDataset_sharding.
I was expecting that global_seq_idx == len(hdf_files) * num_seqs // distrib_size in the end. But this is not the case.
When looking at DistributeFilesDataset.init_seq_order, I wonder about this code:
self_index_base = self.partition_epoch * self._shard_index
self_index_end = self_index_base + self.partition_epoch
The self_index_end here ignores self._num_shards. Is this correct?
(cc @NeoLegends @Icemole @Judyxujj)