mm-cot icon indicating copy to clipboard operation
mm-cot copied to clipboard

ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length. Perhaps your features (`image_ids` in this case) have excessive nesting (inputs type `list` where type `int` is expected).

Open pavankale2709 opened this issue 1 year ago • 1 comments

We are trying to train the model but getting following error, please help in resolution.

'E'], epoch=50, lr=5e-05, bs=2, input_len=512, output_len=512, eval_bs=4, eval_acc=None, train_split='train', val_split='val', test_split='test', use_generate=True, final_eval=False, user_msg='rationale', img_type='vit', eval_le=None, test_le=None, evaluate_dir=None, caption_file='data/instruct_captions.json', use_caption=True, prompt_format='QCM-E', seed=42)

Downloading tokenizer_config.json: 0%| | 0.00/2.50k [00:00<?, ?B/s] Downloading tokenizer_config.json: 100%|##########| 2.50k/2.50k [00:00<00:00, 332kB/s]

Downloading tokenizer.json: 0%| | 0.00/2.42M [00:00<?, ?B/s] Downloading tokenizer.json: 100%|##########| 2.42M/2.42M [00:02<00:00, 1.07MB/s] Downloading tokenizer.json: 100%|##########| 2.42M/2.42M [00:02<00:00, 1.07MB/s]

Downloading (…)cial_tokens_map.json: 0%| | 0.00/2.20k [00:00<?, ?B/s] Downloading (…)cial_tokens_map.json: 100%|##########| 2.20k/2.20k [00:00<00:00, 441kB/s]

Downloading config.json: 0%| | 0.00/1.53k [00:00<?, ?B/s] Downloading config.json: 100%|##########| 1.53k/1.53k [00:00<00:00, 382kB/s]

Downloading model.safetensors: 0%| | 0.00/990M [00:00<?, ?B/s] ... Downloading model.safetensors: 100%|##########| 990M/990M [11:47<00:00, 1.40MB/s] Some weights of T5ForMultimodalGeneration were not initialized from the model checkpoint at declare-lab/flan-alpaca-base and are newly initialized: ['encoder.image_dense.weight', 'encoder.mha_layer.in_proj_weight', 'encoder.mha_layer.out_proj.weight', 'encoder.image_dense.bias', 'encoder.gate_dense.weight', 'encoder.gate_dense.bias', 'encoder.mha_layer.out_proj.bias', 'encoder.mha_layer.in_proj_bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

Downloading generation_config.json: 0%| | 0.00/142 [00:00<?, ?B/s] Downloading generation_config.json: 100%|##########| 142/142 [00:00<00:00, 15.8kB/s]

0%| | 0/318150 [00:00<?, ?it/s]You're using a T5TokenizerFast tokenizer. Please note that with a fast tokenizer, using the __call__ method is faster than using a method to encode the text followed by a call to the pad method to get a padded encoding. Traceback (most recent call last): File "C:\Users\pakale\Anaconda3\lib\site-packages\transformers\tokenization_utils_base.py", line 748, in convert_to_tensors tensor = as_tensor(value) File "C:\Users\pakale\Anaconda3\lib\site-packages\transformers\tokenization_utils_base.py", line 720, in as_tensor return torch.tensor(value) ValueError: expected sequence of length 577 at dim 1 (got 145)

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "...\mm-cot-scienceqa\main.py", line 380, in T5Trainer( File "...\mm-cot-scienceqa\main.py", line 269, in T5Trainer trainer.train() File "C:\Users\pakale\Anaconda3\lib\site-packages\transformers\trainer.py", line 1591, in train return inner_training_loop( File "C:\Users\pakale\Anaconda3\lib\site-packages\transformers\trainer.py", line 1870, in _inner_training_loop for step, inputs in enumerate(epoch_iterator): File "C:\Users\pakale\Anaconda3\lib\site-packages\accelerate\data_loader.py", line 448, in iter current_batch = next(dataloader_iter) File "C:\Users\pakale\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 633, in next data = self._next_data() File "C:\Users\pakale\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 677, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "C:\Users\pakale\Anaconda3\lib\site-packages\torch\utils\data_utils\fetch.py", line 54, in fetch return self.collate_fn(data) File "C:\Users\pakale\Anaconda3\lib\site-packages\transformers\trainer_utils.py", line 737, in call return self.data_collator(features) File "C:\Users\pakale\Anaconda3\lib\site-packages\transformers\data\data_collator.py", line 586, in call features = self.tokenizer.pad( File "C:\Users\pakale\Anaconda3\lib\site-packages\transformers\tokenization_utils_base.py", line 3303, in pad return BatchEncoding(batch_outputs, tensor_type=return_tensors) File "C:\Users\pakale\Anaconda3\lib\site-packages\transformers\tokenization_utils_base.py", line 223, in init self.convert_to_tensors(tensor_type=tensor_type, prepend_batch_axis=prepend_batch_axis) File "C:\Users\pakale\Anaconda3\lib\site-packages\transformers\tokenization_utils_base.py", line 764, in convert_to_tensors raise ValueError( ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length. Perhaps your features (image_ids in this case) have excessive nesting (inputs type list where type int is expected).

0%| | 0/318150 [00:01<?, ?it/s]

====Input Arguments==== { "data_root": "data", "output_dir": "experiments", "model": "declare-lab/flan-alpaca-base", "options": [ "A", "B", "C", "D", "E" ], "epoch": 50, "lr": 5e-05, "bs": 2, "input_len": 512, "output_len": 512, "eval_bs": 4, "eval_acc": null, "train_split": "train", "val_split": "val", "test_split": "test", "use_generate": true, "final_eval": false, "user_msg": "rationale", "img_type": "vit", "eval_le": null, "test_le": null, "evaluate_dir": null, "caption_file": "data/instruct_captions.json", "use_caption": true, "prompt_format": "QCM-E", "seed": 42 } img_features size: (11208, 577, 768) number of train problems: 12726

number of val problems: 4241

number of test problems: 4241

[14:38:56] [Model]: Loading declare-lab/flan-alpaca-base... main.py:66

       [Data]: Reading data...                                   main.py:67
                                                                           

experiments/rationale_declare-lab-flan-alpaca-base_vit_QCM-E_lr5e-05_bs0_op512_ep50 model parameters: 251907840

pavankale2709 avatar Dec 06 '23 12:12 pavankale2709

It seems like a padding mismatch issue in the T5 tokenizer. Not sure if it is due to the update of the tokenizer library. Could you have a check with the latest code?

cooelf avatar May 19 '24 06:05 cooelf