pytorch-seq2seq
pytorch-seq2seq copied to clipboard
Why the batch size is misplaced in the tensor?
Usually the batch variable comes at the start for example: [B, X]
But, print(src.shape)
gives tensor([20, 4])
, tensor([22, 4])
where 4 is the batch size.
I believe that where ever possible it is better to keep the batch size at position 0, I base my assumptions on this stackoverflow answer: https://stackoverflow.com/questions/49466894/how-to-correctly-give-inputs-to-embedding-lstm-and-linear-layers-in-pytorch
Correct me if I am wrong, is the only reason here to interchange the locations from the data-loader itself is ease of use?
Though there may be speed improvements based on the underlying CUDA
, but having batch first allows for greater flexibility in loading custom datasets.
I want to hear different views on this.
I feel like this is more of an implementation issue, or personal preference.
The way I've structured the tutorials (and the way I think about these things) is that if the main model is one that, by default, accepts data with the sequence length in the first dimension then I've kept the data with the sequence length as the first dimension as much as possible - e.g. RNN-based models. Similarly, if the model is one that expects the batch dimension as the first element then I've tried to keep the data with the batch dimension first. This is mainly done to reduce the amount of permutes/transposes done.
I don't believe this reduces "flexibility in loading custom datasets" as the only difference between batch first and batch second is a single call to permute, but I may consider switching everything to have batch as the first element in the future.