transformers icon indicating copy to clipboard operation
transformers copied to clipboard

ValueError: Some specified arguments are not used by the HfArgumentParser: ['--local-rank=0']

Open bestpredicts opened this issue 1 year ago • 2 comments

System Info

transformers version 4.7 , pytorch2.0, python3.9

run the example code in document of transformers

rm -r /tmp/test-clm; CUDA_VISIBLE_DEVICES=0,1 \
python -m torch.distributed.launch --nproc_per_node 2 examples/pytorch/language-modeling/run_clm.py \
--model_name_or_path gpt2 --dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 \
--do_train --output_dir /tmp/test-clm --per_device_train_batch_size 4 --max_steps 200

error info

/nfs/v100-022/anaconda3/lib/python3.9/site-packages/torch/distributed/launch.py:181: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use-env is set by default in torchrun.
If your script expects `--local-rank` argument to be set, please
change it to read from `os.environ['LOCAL_RANK']` instead. See 
https://pytorch.org/docs/stable/distributed.html#launch-utility for 
further instructions

  warnings.warn(
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
Traceback (most recent call last):
  File "/nfs/v100-022/run_clm.py", line 772, in <module>
    main()
  File "/nfs/v100-022/run_clm.py", line 406, in main
    model_args, data_args, training_args = parser.parse_args_into_dataclasses()
  File "/nfs/v100-022//anaconda3/lib/python3.9/site-packages/transformers/hf_argparser.py", line 341, in parse_args_into_dataclasses
    raise ValueError(f"Some specified arguments are not used by the HfArgumentParser: {remaining_args}")
ValueError: Some specified arguments are not used by the HfArgumentParser: ['--local-rank=0']

Who can help?

No response

Information

  • [X] The official example scripts
  • [ ] My own modified scripts

Tasks

  • [X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [ ] My own task or dataset (give details below)

Reproduction

1.Install the following configuration environment: python 3.9 pytroch 2.1 dev trasnsformers 4.7

  1. then run code
rm -r /tmp/test-clm; CUDA_VISIBLE_DEVICES=0,1 \
python -m torch.distributed.launch --nproc_per_node 2 examples/pytorch/language-modeling/run_clm.py \
--model_name_or_path gpt2 --dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 \
--do_train --output_dir /tmp/test-clm --per_device_train_batch_size 4 --max_steps 200
  1. then you can get error. ValueError: Some specified arguments are not used by the HfArgumentParser: ['--local-rank=0']

Expected behavior

1.Install the following configuration environment: python 3.9 pytroch 2.1 dev trasnsformers 4.7

  1. then run code
rm -r /tmp/test-clm; CUDA_VISIBLE_DEVICES=0,1 \
python -m torch.distributed.launch --nproc_per_node 2 examples/pytorch/language-modeling/run_clm.py \
--model_name_or_path gpt2 --dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 \
--do_train --output_dir /tmp/test-clm --per_device_train_batch_size 4 --max_steps 200
  1. then you can get error. ValueError: Some specified arguments are not used by the HfArgumentParser: ['--local-rank=0']

bestpredicts avatar Mar 15 '23 07:03 bestpredicts

Hi @bestpredicts, thanks for raising this issue.

I can confirm that I see the same error with the most recent version of transformers and pytorch 2. I wasn't able to replicate the issue with pytorch 1.13.1 and the same transformers version.

Following the messages in the shared error output, if I set LOCAL_RANK in my environment and pass in --use-env I am able to run on pytorch 2.

LOCAL_RANK=0,1 CUDA_VISIBLE_DEVICES=0,1 \
python -m torch.distributed.launch --nproc_per_node 2 --use-env examples/pytorch/language-modeling/run_clm.py \
--model_name_or_path gpt2 --dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 \
--do_train --output_dir /tmp/test-clm --per_device_train_batch_size 4 --max_steps 200

amyeroberts avatar Mar 15 '23 18:03 amyeroberts

Also note that torch.distributed.launch is deprecated and torchrun is preferred in PyTorch 2.0.

sgugger avatar Mar 15 '23 18:03 sgugger

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Apr 14 '23 15:04 github-actions[bot]

Does anyone solved this problem? I got same problem when use torchrun or torch.distributed.launch, the self.local_rank is -1. my env is pytorch==2.0.0 and transorformers=4.30.1.

TXacs avatar Jun 16 '23 02:06 TXacs

You might try migrating to torchrun? i.e.:

torchrun --nproc_per_node 2 examples/pytorch/language-modeling/run_clm.py \
--model_name_or_path gpt2 --dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 \
--do_train --output_dir /tmp/test-clm --per_device_train_batch_size 4 --max_steps 200

for reference on migrating: https://pytorch.org/docs/stable/elastic/run.html

vejvarm avatar Jul 10 '23 06:07 vejvarm

Have you solve your problems? I came up with the same error when using deepspeed. Solutions provided above didn't work at all. :(

LiuZhihhxx avatar Aug 01 '23 05:08 LiuZhihhxx

另请注意,它torch.distributed.launch已被弃用,并且torchrun在 PyTorch 2.0 中是首选。

Thanks for this tip.

PhenixZhang avatar Aug 11 '23 03:08 PhenixZhang

watching

HeGaoYuan avatar Sep 06 '23 03:09 HeGaoYuan

Print from sys.argv:

['train.py', '--local-rank=0', '--model_name_or_path', './checkpoints/vicuna-7b-v1.5', ...]

other arguments have the format 'key', 'value', but locak_rank is not properly parsed. In the above example, local_rank=0 is treated as a whole. I think this may be something wrong with torch.distributed.launch, since it appends local_rank=0 to the arguments list, but the appended argument can not be properly parsed by HFArgumentParser.

So use torchrun and use --use-env which uses environment variable LOCAL_RANK but not arguments --local_rank is an optional solution.

A hack fix can add this before parse_args_into_dataclasses()

import sys
for arg in sys.argv:
    if arg.startswith("--local-rank="):
        rank = arg.split("=")[1]
        sys.argv.remove(arg)
        sys.argv.append('--local_rank')
        sys.argv.append(rank)

ZhaoChuyang avatar Dec 24 '23 09:12 ZhaoChuyang

i have this problem

ValueError: Some specified arguments are not used by the HfArgumentParser: ['-f', '/root/.local/share/jupyter/runtime/kernel-8d0db21b-3ec1-4b17-987c-be497d81b3c5.json'] image

bai-pei-wjsn avatar Jan 05 '24 16:01 bai-pei-wjsn

You might try migrating to torchrun? i.e.:

torchrun --nproc_per_node 2 examples/pytorch/language-modeling/run_clm.py \
--model_name_or_path gpt2 --dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 \
--do_train --output_dir /tmp/test-clm --per_device_train_batch_size 4 --max_steps 200

for reference on migrating: https://pytorch.org/docs/stable/elastic/run.html

thanks, it is ok for me

sqnian avatar Jan 09 '24 06:01 sqnian

can it run on colab i can't do that

bai-pei-wjsn avatar Jan 09 '24 15:01 bai-pei-wjsn

ValueError: Some specified arguments are not used by the HfArgumentParser: ['--only_optimize_lora']

riyajatar37003 avatar Jun 03 '24 15:06 riyajatar37003