--fp16 True question
I use custom_finetune.sh and no other redundant parameter settings have been changed. encountered a problem that is " raise ValueError("Type fp16 is not supported.")ValueError: Type fp16 is not supported." All installation follows README.md. However, I can set fp16 in other projects, under the same hardware device. Please help me with some advice. Thank you !
Could you please provide more details about the experimental setup and the error encountered? Additionally, can you confirm if other scripts are running correctly?
Here is my script in scripts/tain/custom_finetune.sh. only change the DATA_PATH IMAGE_PATH and OUTPUT_PATH and locolhost0,1,2,3 -> locolhost:0,1
DATA_PATH="/home/ailab830/liavanlinux/dl_hw3/TinyLLaVA_Factory/dataset/text_files/output_dataformat.json" IMAGE_PATH="/home/ailab830/liavanlinux/dl_hw3/TinyLLaVA_Factory/dataset/images" MODEL_MAX_LENGTH=3072 OUTPUT_DIR="/home/ailab830/liavanlinux/dl_hw3/TinyLLaVA_Factory/custom-finetune-TinyLLaVA-Phi-2-SigLIP-3.1B-lora"
deepspeed --include localhost:0,1 --master_port 29501 tinyllava/train/custom_finetune.py
--deepspeed ./scripts/zero2.json
--data_path $DATA_PATH
--image_folder $IMAGE_PATH
--is_multimodal True
--conv_version phi
--mm_vision_select_layer -2
--image_aspect_ratio square
--fp16 True
--training_recipe lora
--tune_type_llm lora
--tune_type_vision_tower frozen
--tune_vision_tower_from_layer 0
--tune_type_connector full
--lora_r 128
--lora_alpha 256
--group_by_modality_length False
--pretrained_model_path "tinyllava/TinyLLaVA-Phi-2-SigLIP-3.1B"
--output_dir $OUTPUT_DIR
--num_train_epochs 1
--per_device_train_batch_size 4
--per_device_eval_batch_size 4
--gradient_accumulation_steps 8
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 50000
--save_total_limit 1
--learning_rate 1e-4
--weight_decay 0.
--warmup_ratio 0.03
--lr_scheduler_type "cosine"
--logging_steps 1
--tf32 False
--model_max_length $MODEL_MAX_LENGTH
--gradient_checkpointing True
--dataloader_num_workers 8
--lazy_preprocess True
--report_to tensorboard
--tokenizer_use_fast False
--run_name custom-finetune-TinyLLaVA-Phi-2-SigLIP-3.1B-lora
And this is my error message.
......
base_model.model.connector._connector.2.weight: 6553600 parameters
base_model.model.connector._connector.2.bias: 2560 parameters
Traceback (most recent call last):
File "/home/ailab830/liavanlinux/dl_hw3/TinyLLaVA_Factory/tinyllava/train/custom_finetune.py", line 52, in
File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/transformers/trainer.py", line 1933, in _inner_training_loop File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/transformers/trainer.py", line 1933, in _inner_training_loop model, self.optimizer = self.accelerator.prepare(self.model, self.optimizer) File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/accelerate/accelerator.py", line 1220, in prepare model, self.optimizer = self.accelerator.prepare(self.model, self.optimizer) File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/accelerate/accelerator.py", line 1220, in prepare result = self._prepare_deepspeed(*args)result = self._prepare_deepspeed(*args)
File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/accelerate/accelerator.py", line 1605, in _prepare_deepspeed File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/accelerate/accelerator.py", line 1605, in _prepare_deepspeed engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs) File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/deepspeed/init.py", line 176, in initialize engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs) File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/deepspeed/init.py", line 176, in initialize engine = DeepSpeedEngine(args=args, File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 240, in init engine = DeepSpeedEngine(args=args, File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 240, in init self._do_sanity_check()self._do_sanity_check()
File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1040, in _do_sanity_check File "/home/ailab830/liavanlinux/dl_hw3/.env/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1040, in _do_sanity_check raise ValueError("Type fp16 is not supported.")raise ValueError("Type fp16 is not supported.")
ValueErrorValueError: : Type fp16 is not supported.Type fp16 is not supported.
Thank for help !
Hi, could you please check your the version of your packages? accelerate==0.27.2? deepspeed==0.14.0?
yes, they are same. accelerate 0.27.2 deepspeed 0.14.0
And I re-downloaded again, but doesn't set up on conda environment. I encounter same --fp16 problem.
from deepspeed.accelerator import get_accelerator flag = get_accelerator().is_fp16_supported() print(flag)
please check this flag is True or False.
If it's False, then it seems your environment of GPU and CUDA and Deepspeed/Accelerator does not support fp16. Not sure the versions of them are compatible with each other.
flag is False. Thank you for help !
flag is False. Thank you for help !
I'm getting the same error here. flag is False. how to fix it please? DeepSpeed general environment info: deepspeed................................0.15.0 accelerate..................................0.33.0 torch............................................1.13.1+cu116 GPU..............................................NVIDIA A100 80GB