LLaVA icon indicating copy to clipboard operation
LLaVA copied to clipboard

[Question] Is the LLaVA-1.6 training/fine-tuning code ready?

Open DafengChi opened this issue 1 year ago • 9 comments

Question

when i load the “llava-v1.5-7b”, the training process is ok but when i load the “llava-v1.6-vicuna-7b” the error is Traceback (most recent call last): File "/home/ma-user/work/chidafeng/Embodied_AI_Agent/llava/train/train_xformers.py", line 13, in train() File "/home/ma-user/work/chidafeng/EmbodiedAgent/llava/train/train.py", line 970, in train trainer.train() File "/home/ma-user/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/trainer.py", line 1539, in train return inner_training_loop(
trainer.train() File "/home/ma-user/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/trainer.py", line 1687, in _inner_training_loop model, self.optimizer = self.accelerator.prepare(self.model, self.optimizer) File "/home/ma-user/anaconda3/envs/llava/lib/python3.10/site-packages/accelerate/accelerator.py", line 1198, in prepare result = self._prepare_deepspeed(*args) File "/home/ma-user/anaconda3/envs/llava/lib/python3.10/site-packages/accelerate/accelerator.py", line 1537, in _prepare_deepspeed model, self.optimizer = self.accelerator.prepare(self.model, self.optimizer) engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs) File "/home/ma-user/anaconda3/envs/llava/lib/python3.10/site-packages/deepspeed/init.py", line 171, in initialize engine = DeepSpeedEngine(args=args,
engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs) File "/home/ma-user/anaconda3/envs/llava/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 304, in init File "/home/ma-user/anaconda3/envs/llava/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1234, in _configure_optimizer self._configure_optimizer(optimizer, model_parameters)
self.optimizer = self._configure_zero_optimizer(basic_optimizer) File "/home/ma-user/anaconda3/envs/llava/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1563, in _configure_zero_optimizer self.optimizer = self._configure_zero_optimizer(basic_optimizer) optimizer = DeepSpeedZeroOptimizer_Stage3( File "/home/ma-user/anaconda3/envs/llava/lib/python3.10/site-packages/deepspeed/runtime/zero/stage3.py", line 314, in init self._create_fp16_partitions_with_defragmentation(self.trainable_param_groups)
optimizer = DeepSpeedZeroOptimizer_Stage3( File "/home/ma-user/anaconda3/envs/llava/lib/python3.10/site-packages/deepspeed/runtime/zero/stage3.py", line 687, in _create_fp16_partitions_with_defragmentation device_buffer = class.defragment(parameter_partitions) File "/home/ma-user/anaconda3/envs/llava/lib/python3.10/site-packages/deepspeed/runtime/zero/stage3.py", line 687, in _create_fp16_partitions_with_defragmentation

File "/home/ma-user/anaconda3/envs/llava/lib/python3.10/site-packages/deepspeed/runtime/zero/stage3.py", line 522, in defragment assert len(set(t.dtype for t in tensors)) == 1
device_buffer = class.defragment(parameter_partitions) AssertionError

DafengChi avatar Mar 12 '24 03:03 DafengChi

+1

pengwangucla avatar Mar 12 '24 20:03 pengwangucla

+1

joaomsimoes avatar Mar 13 '24 05:03 joaomsimoes

+1

yinincanada avatar Mar 15 '24 13:03 yinincanada

+1

linkboyx avatar Mar 19 '24 12:03 linkboyx

∞ +1

LoFiApostasy avatar Mar 22 '24 01:03 LoFiApostasy

+1

markmywords-tech avatar Mar 23 '24 09:03 markmywords-tech

+1

drogozhang avatar Mar 25 '24 19:03 drogozhang

did anyone get something?

jsm69 avatar Apr 07 '24 14:04 jsm69

+1

PzWHU avatar May 09 '24 07:05 PzWHU

any update?

NicoZenith avatar Jun 05 '24 13:06 NicoZenith