LLaVA
LLaVA copied to clipboard
[Question] Is the LLaVA-1.6 training/fine-tuning code ready?
Question
when i load the “llava-v1.5-7b”, the training process is ok
but when i load the “llava-v1.6-vicuna-7b”
the error is
Traceback (most recent call last):
File "/home/ma-user/work/chidafeng/Embodied_AI_Agent/llava/train/train_xformers.py", line 13, in
trainer.train()
File "/home/ma-user/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/trainer.py", line 1687, in _inner_training_loop
model, self.optimizer = self.accelerator.prepare(self.model, self.optimizer)
File "/home/ma-user/anaconda3/envs/llava/lib/python3.10/site-packages/accelerate/accelerator.py", line 1198, in prepare
result = self._prepare_deepspeed(*args)
File "/home/ma-user/anaconda3/envs/llava/lib/python3.10/site-packages/accelerate/accelerator.py", line 1537, in _prepare_deepspeed
model, self.optimizer = self.accelerator.prepare(self.model, self.optimizer)
engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
File "/home/ma-user/anaconda3/envs/llava/lib/python3.10/site-packages/deepspeed/init.py", line 171, in initialize
engine = DeepSpeedEngine(args=args,
engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
File "/home/ma-user/anaconda3/envs/llava/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 304, in init
File "/home/ma-user/anaconda3/envs/llava/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1234, in _configure_optimizer
self._configure_optimizer(optimizer, model_parameters)
self.optimizer = self._configure_zero_optimizer(basic_optimizer)
File "/home/ma-user/anaconda3/envs/llava/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1563, in _configure_zero_optimizer
self.optimizer = self._configure_zero_optimizer(basic_optimizer)
optimizer = DeepSpeedZeroOptimizer_Stage3(
File "/home/ma-user/anaconda3/envs/llava/lib/python3.10/site-packages/deepspeed/runtime/zero/stage3.py", line 314, in init
self._create_fp16_partitions_with_defragmentation(self.trainable_param_groups)
optimizer = DeepSpeedZeroOptimizer_Stage3(
File "/home/ma-user/anaconda3/envs/llava/lib/python3.10/site-packages/deepspeed/runtime/zero/stage3.py", line 687, in _create_fp16_partitions_with_defragmentation
device_buffer = class.defragment(parameter_partitions) File "/home/ma-user/anaconda3/envs/llava/lib/python3.10/site-packages/deepspeed/runtime/zero/stage3.py", line 687, in _create_fp16_partitions_with_defragmentation
File "/home/ma-user/anaconda3/envs/llava/lib/python3.10/site-packages/deepspeed/runtime/zero/stage3.py", line 522, in defragment
assert len(set(t.dtype for t in tensors)) == 1
device_buffer = class.defragment(parameter_partitions)
AssertionError
+1
+1
+1
+1
∞ +1
+1
+1
did anyone get something?
+1
any update?