LLaVA
LLaVA copied to clipboard
[Usage] Some weights of LlavaLlamaForCausalLM were not initialized from the model checkpoint When I try to pretrain model.
Describe the issue
Issue:
Command:
Bash pretrain.sh on my fineunted Llama2 model.
Log:
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Loading checkpoint shards: 100%|██████████| 3/3 [00:55<00:00, 18.40s/it]
Some weights of LlavaLlamaForCausalLM were not initialized from the model checkpoint at xDAN-AI/xDAN-L1-llama2-Think-0930-e35 and are newly initialized: ['model.layers.20.self_attn.rotary_emb.inv_freq', 'model.layers.24.self_attn.rotary_emb.inv_freq', 'model.layers.35.self_attn.rotary_emb.inv_freq', 'model.layers.25.self_attn.rotary_emb.inv_freq', 'model.layers.4.self_attn.rotary_emb.inv_freq', 'model.layers.0.self_attn.rotary_emb.inv_freq', 'model.layers.31.self_attn.rotary_emb.inv_freq', 'model.layers.33.self_attn.rotary_emb.inv_freq', 'model.layers.5.self_attn.rotary_emb.inv_freq', 'model.layers.18.self_attn.rotary_emb.inv_freq', 'model.layers.9.self_attn.rotary_emb.inv_freq', 'model.layers.16.self_attn.rotary_emb.inv_freq', 'model.layers.13.self_attn.rotary_emb.inv_freq', 'model.layers.34.self_attn.rotary_emb.inv_freq', 'model.layers.22.self_attn.rotary_emb.inv_freq', 'model.layers.19.self_attn.rotary_emb.inv_freq', 'model.layers.23.self_attn.rotary_emb.inv_freq', 'model.layers.12.self_attn.rotary_emb.inv_freq', 'model.layers.28.self_attn.rotary_emb.inv_freq', 'model.layers.39.self_attn.rotary_emb.inv_freq', 'model.layers.38.self_attn.rotary_emb.inv_freq', 'model.layers.37.self_attn.rotary_emb.inv_freq', 'model.layers.2.self_attn.rotary_emb.inv_freq', 'model.layers.15.self_attn.rotary_emb.inv_freq', 'model.layers.32.self_attn.rotary_emb.inv_freq', 'model.layers.1.self_attn.rotary_emb.inv_freq', 'model.layers.30.self_attn.rotary_emb.inv_freq', 'model.layers.36.self_attn.rotary_emb.inv_freq', 'model.layers.11.self_attn.rotary_emb.inv_freq', 'model.layers.10.self_attn.rotary_emb.inv_freq', 'model.layers.27.self_attn.rotary_emb.inv_freq', 'model.layers.29.self_attn.rotary_emb.inv_freq', 'model.layers.8.self_attn.rotary_emb.inv_freq', 'model.layers.3.self_attn.rotary_emb.inv_freq', 'model.layers.17.self_attn.rotary_emb.inv_freq', 'model.layers.6.self_attn.rotary_emb.inv_freq', 'model.layers.7.self_attn.rotary_emb.inv_freq', 'model.layers.14.self_attn.rotary_emb.inv_freq', 'model.layers.26.self_attn.rotary_emb.inv_freq', 'model.layers.21.self_attn.rotary_emb.inv_freq']
0%| | 0/2181 [00:00<?, ?it/s]Traceback (most recent call last):
File "/workspace/LLaVA/llava/train/train_mem.py", line 13, in <module>
train()
File "/workspace/LLaVA/llava/train/train.py", line 930, in train
trainer.train()
File "/root/miniconda3/envs/llava/lib/python3.10/site-packages/transformers/trainer.py", line 1539, in train
return inner_training_loop(
File "/root/miniconda3/envs/llava/lib/python3.10/site-packages/transformers/trainer.py", line 1787, in _inner_training_loop
for step, inputs in enumerate(epoch_iterator):
File "/root/miniconda3/envs/llava/lib/python3.10/site-packages/accelerate/data_loader.py", line 381, in __iter__
dataloader_iter = super().__iter__()
File "/root/miniconda3/envs/llava/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 441, in __iter__
return self._get_iterator()
File "/root/miniconda3/envs/llava/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 388, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "/root/miniconda3/envs/llava/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1084, in __init__
self._reset(loader, first_iter=True)
File "/root/miniconda3/envs/llava/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1117, in _reset
self._try_put_index()
File "/root/miniconda3/envs/llava/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1351, in _try_put_index
index = self._next_index()
File "/root/miniconda3/envs/llava/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 623, in _next_index
return next(self._sampler_iter) # may raise StopIteration
File "/root/miniconda3/envs/llava/lib/python3.10/site-packages/accelerate/data_loader.py", line 175, in _iter_with_no_split
for idx, batch in enumerate(self.batch_sampler):
File "/root/miniconda3/envs/llava/lib/python3.10/site-packages/torch/utils/data/sampler.py", line 254, in __iter__
for idx in self.sampler:
File "/workspace/LLaVA/llava/train/llava_trainer.py", line 126, in __iter__
indices = get_modality_length_grouped_indices(self.lengths, self.batch_size, self.world_size, generator=self.generator)
File "/workspace/LLaVA/llava/train/llava_trainer.py", line 59, in get_modality_length_grouped_indices
lang_indices, lang_lengths = zip(*[(i, -l) for i, l in enumerate(lengths) if l < 0])
Screenshots: You may attach screenshots if it better explains the issue.
Seems that the rotary embed parameters are not saved, which should be fine? What is the error caused StopIteration exception? The bottom part may be important.
@haotian-liu I face a similar issue with a different model loading. Can you please explain why it should be fine even if the rotary embed parameters are not loaded from the model? I aim to use the new model to pretrain and fine tune llava v1.5. Would it still be fine to do that even if the model is unable to load the rotary embed parameters?
For the miss weights, I think it may caused by the transformer pkg version. I update it from 4.31.0 to 4.33.2 and solved.
@ZizhenWang Thanks. This solves the issue.
For the miss weights, I think it may caused by the transformer pkg version. I update it from 4.31.0 to 4.33.2 and solved.
@ZizhenWang I faced the problem like #1417, do you know how to slove it? Thanks in advance!