[BUG]
(lmflow) PS E:\LMFlow-main\LMFlow-main> bash ./scripts/run_finetune.sh
[2023-04-24 19:29:27,417] [WARNING] [runner.py:190:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2023-04-24 19:29:27,440] [INFO] [runner.py:540:main] cmd = D:\UserSoftware\Anaconda3\envs\lmflow\python.exe -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=11000 --enable_each_rank_log=None examples/finetune.py --model_name_or_path gpt2 --dataset_path E:/LMFlow-main/LMFlow-main/data/alpaca/train --output_dir E:/LMFlow-main/LMFlow-main/output_models/finetune --overwrite_output_dir --num_train_epochs 0.01 --learning_rate 2e-5 --block_size 512 --per_device_train_batch_size 1 --deepspeed configs/ds_config_zero3.json --bf16 --run_name finetune --validation_split_percentage 0 --logging_steps 20 --do_train --ddp_timeout 72000 --save_steps 5000 --dataloader_num_workers 1
[2023-04-24 19:29:29,047] [INFO] [launch.py:229:main] WORLD INFO DICT: {'localhost': [0]}
[2023-04-24 19:29:29,047] [INFO] [launch.py:235:main] nnodes=1, num_local_procs=1, node_rank=0
[2023-04-24 19:29:29,047] [INFO] [launch.py:246:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0]})
[2023-04-24 19:29:29,047] [INFO] [launch.py:247:main] dist_world_size=1
[2023-04-24 19:29:29,047] [INFO] [launch.py:249:main] Setting CUDA_VISIBLE_DEVICES=0
Traceback (most recent call last):
File "E:\LMFlow-main\LMFlow-main\examples\finetune.py", line 60, in
The error is caused by ValueError: Your setup doesn't support bf16/gpu. You need torch>=1.10, using Ampere GPU with cuda>=11.0.
It could be resolved by setting --fp16 instead of --bf16
This issue has been marked as stale because it has not had recent activity. If you think this still needs to be addressed please feel free to reopen this issue. Thanks