FastChat icon indicating copy to clipboard operation
FastChat copied to clipboard

Issue:"Your setup doesn't support bf16/gpu. You need torch>=1.10, using Ampere GPU with cuda>=11.0"

Open weikeltf opened this issue 1 year ago • 1 comments

My GPU is 8*TeslsV100 32G, the software environment is python3.10+cuda11.6+torch2.0.0+ transformers4.28.0.dev0, I run the fine tuning code: torchrun --nproc_per_node=8 --master_port=20001 fastchat/train/train_mem.py
--model_name_or_path /llama-13b
--data_path alpaca-data-conversation.json
--bf16 True
--output_dir output
--num_train_epochs 3
--per_device_train_batch_size 2
--per_device_eval_batch_size 2
--gradient_accumulation_steps 16
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 1200
--save_total_limit 10
--learning_rate 2e-5
--weight_decay 0.
--warmup_ratio 0.03
--lr_scheduler_type "cosine"
--logging_steps 1
--fsdp "full_shard auto_wrap"
--fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer'
--tf32 True
--model_max_length 2048
--gradient_checkpointing True
--lazy_preprocess True But an error is reported: ValueError: Your setup doesn't support bf16/gpu. You need torch>=1.10, using Ampere GPU with cuda>=11.0

How can I fix this?

weikeltf avatar Apr 13 '23 05:04 weikeltf

Change --bf16 to False, V100 does not support bf16.

sgsdxzy avatar Apr 14 '23 03:04 sgsdxzy

yes, @sgsdxzy is right. Please re-open the issue if you still see it.

zhisbug avatar Apr 22 '23 02:04 zhisbug

can you tell me how install the flash_attn,thank you

luyx33 avatar Apr 23 '23 04:04 luyx33

Thanks! I see it

weikeltf avatar Apr 27 '23 01:04 weikeltf

Change --bf16 to False, V100 does not support bf16.

flash_attn

I directly use the open source docker environment, which contains flash_attn But I think pip3 install should be straightforward, preferably using the -i parameter to specify a pip source

weikeltf avatar Apr 27 '23 02:04 weikeltf

Changing --bf16 to False, did not help me. I got another error ValueError: --tf32 requires Ampere or a newer GPU arch, cuda>=11 and torch>=1.7 Any suggestions ?

roshan-gopalakrishnan avatar May 11 '23 03:05 roshan-gopalakrishnan

Changing --bf16 to False, did not help me. I got another error ValueError: --tf32 requires Ampere or a newer GPU arch, cuda>=11 and torch>=1.7 Any suggestions ?

me too ,how to ?

richardkelly2014 avatar May 22 '23 12:05 richardkelly2014

Set --tf32 to False as well

vsahil avatar Jul 16 '23 21:07 vsahil

Am using A100 also got this error. weired...

OpenJarvisAI avatar Apr 10 '24 15:04 OpenJarvisAI