LongRecipe icon indicating copy to clipboard operation
LongRecipe copied to clipboard

batch size设置大于1时会报shape不对等错误

Open 233function opened this issue 4 months ago • 0 comments

Exception type: ValueError Detail: Traceback (most recent call last): File "/checkpoint/binary/train_package/utils/train.py", line 424, in LongRecipe_train.train_with_stage() File "/checkpoint/binary/train_package/utils/train.py", line 360, in train_with_stage model, accelerator = self.train(stage, model, accelerator, train_data_loader, loss_func, optim, scheduler, progress_bar) File "/checkpoint/binary/train_package/utils/train.py", line 235, in train for idx, batch in enumerate(train_data_loader): File "/root/.local/lib/python3.10/site-packages/accelerate/data_loader.py", line 550, in iter current_batch = next(dataloader_iter) File "/opt/conda/envs/python3.10.13/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 631, in next data = self._next_data() File "/opt/conda/envs/python3.10.13/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 675, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/opt/conda/envs/python3.10.13/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch return self.collate_fn(data) File "/root/.local/lib/python3.10/site-packages/transformers/data/data_collator.py", line 92, in default_data_collator return torch_default_data_collator(features) File "/root/.local/lib/python3.10/site-packages/transformers/data/data_collator.py", line 158, in torch_default_data_collator batch[k] = torch.tensor([f[k] for f in features]) ValueError: expected sequence of length 33921 at dim 1 (got 39205)

233function avatar Oct 03 '24 13:10 233function