TinyLLaVA_Factory icon indicating copy to clipboard operation
TinyLLaVA_Factory copied to clipboard

--fp16 True question

Open Liavan0122 opened this issue 1 year ago • 7 comments

I use custom_finetune.sh and no other redundant parameter settings have been changed. encountered a problem that is " raise ValueError("Type fp16 is not supported.")ValueError: Type fp16 is not supported." All installation follows README.md. However, I can set fp16 in other projects, under the same hardware device. Please help me with some advice. Thank you !

Liavan0122 avatar Jun 13 '24 07:06 Liavan0122

Could you please provide more details about the experimental setup and the error encountered? Additionally, can you confirm if other scripts are running correctly?

shiym2000 avatar Jun 14 '24 10:06 shiym2000

Here is my script in scripts/tain/custom_finetune.sh. only change the DATA_PATH IMAGE_PATH and OUTPUT_PATH and locolhost0,1,2,3 -> locolhost:0,1

DATA_PATH="/home/ailab830/liavanlinux/dl_hw3/TinyLLaVA_Factory/dataset/text_files/output_dataformat.json" IMAGE_PATH="/home/ailab830/liavanlinux/dl_hw3/TinyLLaVA_Factory/dataset/images" MODEL_MAX_LENGTH=3072 OUTPUT_DIR="/home/ailab830/liavanlinux/dl_hw3/TinyLLaVA_Factory/custom-finetune-TinyLLaVA-Phi-2-SigLIP-3.1B-lora"

deepspeed --include localhost:0,1 --master_port 29501 tinyllava/train/custom_finetune.py
--deepspeed ./scripts/zero2.json
--data_path $DATA_PATH
--image_folder $IMAGE_PATH
--is_multimodal True
--conv_version phi
--mm_vision_select_layer -2
--image_aspect_ratio square
--fp16 True
--training_recipe lora
--tune_type_llm lora
--tune_type_vision_tower frozen
--tune_vision_tower_from_layer 0
--tune_type_connector full
--lora_r 128
--lora_alpha 256
--group_by_modality_length False
--pretrained_model_path "tinyllava/TinyLLaVA-Phi-2-SigLIP-3.1B"
--output_dir $OUTPUT_DIR
--num_train_epochs 1
--per_device_train_batch_size 4
--per_device_eval_batch_size 4
--gradient_accumulation_steps 8
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 50000
--save_total_limit 1
--learning_rate 1e-4
--weight_decay 0.
--warmup_ratio 0.03
--lr_scheduler_type "cosine"
--logging_steps 1
--tf32 False
--model_max_length $MODEL_MAX_LENGTH
--gradient_checkpointing True
--dataloader_num_workers 8
--lazy_preprocess True
--report_to tensorboard
--tokenizer_use_fast False
--run_name custom-finetune-TinyLLaVA-Phi-2-SigLIP-3.1B-lora

And this is my error message. ...... base_model.model.connector._connector.2.weight: 6553600 parameters base_model.model.connector._connector.2.bias: 2560 parameters Traceback (most recent call last): File "/home/ailab830/liavanlinux/dl_hw3/TinyLLaVA_Factory/tinyllava/train/custom_finetune.py", line 52, in train() File "/home/ailab830/liavanlinux/dl_hw3/TinyLLaVA_Factory/tinyllava/train/custom_finetune.py", line 47, in train Traceback (most recent call last): File "/home/ailab830/liavanlinux/dl_hw3/TinyLLaVA_Factory/tinyllava/train/custom_finetune.py", line 52, in trainer.train() File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/transformers/trainer.py", line 1780, in train train() File "/home/ailab830/liavanlinux/dl_hw3/TinyLLaVA_Factory/tinyllava/train/custom_finetune.py", line 47, in train trainer.train() File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/transformers/trainer.py", line 1780, in train return inner_training_loop(return inner_training_loop(

File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/transformers/trainer.py", line 1933, in _inner_training_loop File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/transformers/trainer.py", line 1933, in _inner_training_loop model, self.optimizer = self.accelerator.prepare(self.model, self.optimizer) File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/accelerate/accelerator.py", line 1220, in prepare model, self.optimizer = self.accelerator.prepare(self.model, self.optimizer) File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/accelerate/accelerator.py", line 1220, in prepare result = self._prepare_deepspeed(*args)result = self._prepare_deepspeed(*args)

File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/accelerate/accelerator.py", line 1605, in _prepare_deepspeed File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/accelerate/accelerator.py", line 1605, in _prepare_deepspeed engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs) File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/deepspeed/init.py", line 176, in initialize engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs) File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/deepspeed/init.py", line 176, in initialize engine = DeepSpeedEngine(args=args, File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 240, in init engine = DeepSpeedEngine(args=args, File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 240, in init self._do_sanity_check()self._do_sanity_check()

File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1040, in _do_sanity_check File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1040, in _do_sanity_check raise ValueError("Type fp16 is not supported.")raise ValueError("Type fp16 is not supported.")

ValueErrorValueError: : Type fp16 is not supported.Type fp16 is not supported.

Thank for help !

Liavan0122 avatar Jun 15 '24 09:06 Liavan0122

Hi, could you please check your the version of your packages? accelerate==0.27.2? deepspeed==0.14.0?

YingHuTsing avatar Jun 15 '24 14:06 YingHuTsing

yes, they are same. accelerate 0.27.2 deepspeed 0.14.0

And I re-downloaded again, but doesn't set up on conda environment. I encounter same --fp16 problem.

Liavan0122 avatar Jun 16 '24 11:06 Liavan0122

from deepspeed.accelerator import get_accelerator flag = get_accelerator().is_fp16_supported() print(flag)

please check this flag is True or False.

If it's False, then it seems your environment of GPU and CUDA and Deepspeed/Accelerator does not support fp16. Not sure the versions of them are compatible with each other.

YingHuTsing avatar Jun 17 '24 00:06 YingHuTsing

flag is False. Thank you for help !

Liavan0122 avatar Jun 17 '24 03:06 Liavan0122

flag is False. Thank you for help !

I'm getting the same error here. flag is False. how to fix it please? DeepSpeed general environment info: deepspeed................................0.15.0 accelerate..................................0.33.0 torch............................................1.13.1+cu116 GPU..............................................NVIDIA A100 80GB

zhiwentian avatar Aug 30 '24 10:08 zhiwentian