Albert Zeyer
Albert Zeyer
Without looking at the test or the code: At the time we construct the initial state, we already know the beam size, and we pass it to that function `get_rec_initial_output`....
The initial states/outputs are already including the beam. That happens e.g. in `get_rec_initial_output`. There you have already the batch with beam size.
For reference, the failure can be seen [here](https://travis-ci.org/rwth-i6/returnn/jobs/544748199#L628) (but I did not really check the test case yet...).
I wonder whether this is still a bug or already works now. Someone should check.
Can you link the corresponding issue on pytorch-to-returnn?
But despite the recommendation to not manually/explicitly add broadcast axes, I think it should still work. Or in general, we have the basic principle in RETURNN that the order of...
> My issue was resolved with [rwth-i6/pytorch-to-returnn#58](https://github.com/rwth-i6/pytorch-to-returnn/pull/58). Should we close the issue or do you want to keep it open since it should still work with RETURNN? No, this problem...
> Sorry if confusing, but this builds on and includes commits from #711, so to be merged afterwards. Can you rebase now?
> `HDFDataset` ... too costly to shuffle at run-time because of huge number of sequences Why? What number? This sounds wrong. If the array of offsets fits into memory (which...
> `random_seed_offset` has the desired effect on the `CombinedDataset` level, however the sequences that are sampled from the `HDFDataset`s are the same for all GPUs You mean because you stick...