Albert Zeyer comments

Results 873 comments of


                                            Albert Zeyer

trafficstars

Initial state cannot be taken from base layer

Without looking at the test or the code: At the time we construct the initial state, we already know the beam size, and we pass it to that function `get_rec_initial_output`....

Initial state cannot be taken from base layer

The initial states/outputs are already including the beam. That happens e.g. in `get_rec_initial_output`. There you have already the batch with beam size.

Initial state cannot be taken from base layer

For reference, the failure can be seen [here](https://travis-ci.org/rwth-i6/returnn/jobs/544748199#L628) (but I did not really check the test case yet...).

Initial state cannot be taken from base layer

I wonder whether this is still a bug or already works now. Someone should check.

CombineLayer mismatches feature dims

Can you link the corresponding issue on pytorch-to-returnn?

CombineLayer mismatches feature dims

But despite the recommendation to not manually/explicitly add broadcast axes, I think it should still work. Or in general, we have the basic principle in RETURNN that the order of...

CombineLayer mismatches feature dims

> My issue was resolved with [rwth-i6/pytorch-to-returnn#58](https://github.com/rwth-i6/pytorch-to-returnn/pull/58). Should we close the issue or do you want to keep it open since it should still work with RETURNN? No, this problem...

Partition epoch as a multi-GPU dataset distribution method

> Sorry if confusing, but this builds on and includes commits from #711, so to be merged afterwards. Can you rebase now?

Partition epoch as a multi-GPU dataset distribution method

> `HDFDataset` ... too costly to shuffle at run-time because of huge number of sequences Why? What number? This sounds wrong. If the array of offsets fits into memory (which...

Partition epoch as a multi-GPU dataset distribution method

> `random_seed_offset` has the desired effect on the `CombinedDataset` level, however the sequences that are sampled from the `HDFDataset`s are the same for all GPUs You mean because you stick...