ChatGLM2-6B icon indicating copy to clipboard operation
ChatGLM2-6B copied to clipboard

[BUG] <当我运行微调bash train.sh时报错>

Open dayu1979 opened this issue 2 years ago • 1 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Current Behavior

当我运行微调bash train.sh时报错: main.py: error: the following arguments are required: --model_name_or_path 该参数我已经指定。

Expected Behavior

不应提示,并正确运行

Steps To Reproduce

--model_name_or_path ../chatglm2-6b \

Environment

- OS:wsl2
- Python:3.10.9
- Transformers:4.27.1
- PyTorch:2.0.1+cu118 
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :True

Anything else?

No response

dayu1979 avatar Jul 06 '23 03:07 dayu1979

请提供完整的报错信息和运行脚本

duzx16 avatar Jul 06 '23 03:07 duzx16

请提供完整的报错信息和运行脚本 这是运行脚本: PRE_SEQ_LEN=128 LR=2e-2 NUM_GPUS=1

torchrun --standalone --nnodes=1 --nproc-per-node=$NUM_GPUS main.py
--do_train
--train_file AdvertiseGen/train.json
--validation_file AdvertiseGen/dev.json
--preprocessing_num_workers 10
--prompt_column content
--response_column summary
--overwrite_cache
--model_name_or_path THUDM/chatglm2-6b-int4
--output_dir output/adgen-chatglm2-6b-pt-$PRE_SEQ_LEN-$LR
--overwrite_output_dir
--max_source_length 64
--max_target_length 128
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 16
--predict_with_generate
--max_steps 3000
--logging_steps 10
--save_steps 1000
--learning_rate $LR
--pre_seq_len $PRE_SEQ_LEN
--quantization_bit 4

这是报的错: root@2ebcbc7e11d1:/home1/code/ChatGLM2-6B-main/ptuning# sh train.sh usage: torchrun [-h] [--nnodes NNODES] [--nproc_per_node NPROC_PER_NODE] [--rdzv_backend RDZV_BACKEND] [--rdzv_endpoint RDZV_ENDPOINT] [--rdzv_id RDZV_ID] [--rdzv_conf RDZV_CONF] [--standalone] [--max_restarts MAX_RESTARTS] [--monitor_interval MONITOR_INTERVAL] [--start_method {spawn,fork,forkserver}] [--role ROLE] [-m] [--no_python] [--run_path] [--log_dir LOG_DIR] [-r REDIRECTS] [-t TEE] [--node_rank NODE_RANK] [--master_addr MASTER_ADDR] [--master_port MASTER_PORT] training_script ... torchrun: error: unrecognized arguments: --nproc-per-node=1

sssssshf avatar Jul 10 '23 07:07 sssssshf

我遇到了同样的问题,解决方案在这里 (https://github.com/THUDM/ChatGLM-6B/issues/448)

Chen-Wang-CUHK avatar Jul 13 '23 09:07 Chen-Wang-CUHK

请提供完整的报错信息和运行脚本 这是运行脚本: PRE_SEQ_LEN=128 LR=2e-2 NUM_GPUS=1

torchrun --standalone --nnodes=1 --nproc-per-node=$NUM_GPUS main.py --do_train --train_file AdvertiseGen/train.json --validation_file AdvertiseGen/dev.json --preprocessing_num_workers 10 --prompt_column content --response_column summary --overwrite_cache --model_name_or_path THUDM/chatglm2-6b-int4 --output_dir output/adgen-chatglm2-6b-pt-$PRE_SEQ_LEN-$LR --overwrite_output_dir --max_source_length 64 --max_target_length 128 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --gradient_accumulation_steps 16 --predict_with_generate --max_steps 3000 --logging_steps 10 --save_steps 1000 --learning_rate $LR --pre_seq_len $PRE_SEQ_LEN --quantization_bit 4

这是报的错: root@2ebcbc7e11d1:/home1/code/ChatGLM2-6B-main/ptuning# sh train.sh usage: torchrun [-h] [--nnodes NNODES] [--nproc_per_node NPROC_PER_NODE] [--rdzv_backend RDZV_BACKEND] [--rdzv_endpoint RDZV_ENDPOINT] [--rdzv_id RDZV_ID] [--rdzv_conf RDZV_CONF] [--standalone] [--max_restarts MAX_RESTARTS] [--monitor_interval MONITOR_INTERVAL] [--start_method {spawn,fork,forkserver}] [--role ROLE] [-m] [--no_python] [--run_path] [--log_dir LOG_DIR] [-r REDIRECTS] [-t TEE] [--node_rank NODE_RANK] [--master_addr MASTER_ADDR] [--master_port MASTER_PORT] training_script ... torchrun: error: unrecognized arguments: --nproc-per-node=1

我也遇到了这个问题,请问你后来解决了吗?

asdasdasaasa avatar Jul 14 '23 10:07 asdasdasaasa

请提供完整的报错信息和运行脚本 这是运行脚本: PRE_SEQ_LEN=128 LR=2e-2 NUM_GPUS=1

torchrun --standalone --nnodes=1 --nproc-per-node=$NUM_GPUS main.py --do_train --train_file AdvertiseGen/train.json --validation_file AdvertiseGen/dev.json --preprocessing_num_workers 10 --prompt_column content --response_column summary --overwrite_cache --model_name_or_path THUDM/chatglm2-6b-int4 --output_dir output/adgen-chatglm2-6b-pt-$PRE_SEQ_LEN-$LR --overwrite_output_dir --max_source_length 64 --max_target_length 128 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --gradient_accumulation_steps 16 --predict_with_generate --max_steps 3000 --logging_steps 10 --save_steps 1000 --learning_rate $LR --pre_seq_len $PRE_SEQ_LEN --quantization_bit 4 这是报的错: root@2ebcbc7e11d1:/home1/code/ChatGLM2-6B-main/ptuning# sh train.sh usage: torchrun [-h] [--nnodes NNODES] [--nproc_per_node NPROC_PER_NODE] [--rdzv_backend RDZV_BACKEND] [--rdzv_endpoint RDZV_ENDPOINT] [--rdzv_id RDZV_ID] [--rdzv_conf RDZV_CONF] [--standalone] [--max_restarts MAX_RESTARTS] [--monitor_interval MONITOR_INTERVAL] [--start_method {spawn,fork,forkserver}] [--role ROLE] [-m] [--no_python] [--run_path] [--log_dir LOG_DIR] [-r REDIRECTS] [-t TEE] [--node_rank NODE_RANK] [--master_addr MASTER_ADDR] [--master_port MASTER_PORT] training_script ... torchrun: error: unrecognized arguments: --nproc-per-node=1

我也遇到了这个问题,请问你后来解决了吗?

--nproc-per-node=1改成--nproc_per_node=1试试

wusi1590 avatar Jul 18 '23 13:07 wusi1590