marib00
Results
1
issues of
marib00
### 🐛 Describe the bug According to the documentation `torch.distributed.tensor.parallel.SequenceParallel` should shard on the sequence dimension i.e. `[B, T, C] -> [B, T//_world_size, C]` but it seems to be tiling...
oncall: distributed
triaged