DeepSpeedExamples icon indicating copy to clipboard operation
DeepSpeedExamples copied to clipboard

map[i] = val_or_map.get(i, Std.NONE) AttributeError: 'NoneType' object has no attribute 'get'

Open SeekPoint opened this issue 2 years ago • 0 comments

(gh_deepspeed) ub2004@ub2004-B85M-A0:~/llm_dev/DeepSpeedExamples/training/data_efficiency/gpt_finetuning$ python -m torch.distributed.launch --nproc_per_node=1 --master_port 12346 run_clm_no_trainer.py --random_ltd --dataset_name ptb_text_only --dataset_config_name penn_treebank --model_name_or_path gpt2 --per_device_train_batch_size 2 --per_device_eval_batch_size 2 --local_rank 2 --num_train_epochs 2 --deepspeed_config config/ds_config_gpt_base_random_ltd.json --deepspeed --seed 1234 --num_warmup_steps 100 --output_dir ./log.log /home/ub2004/anaconda3/envs/gh_deepspeed/lib/python3.11/site-packages/torch/distributed/launch.py:180: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects --local_rank argument to be set, please change it to read from os.environ['LOCAL_RANK'] instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions

warnings.warn( Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "/home/ub2004/anaconda3/envs/gh_deepspeed/lib/python3.11/site-packages/torch/distributed/launch.py", line 195, in main() File "/home/ub2004/anaconda3/envs/gh_deepspeed/lib/python3.11/site-packages/torch/distributed/launch.py", line 191, in main launch(args) File "/home/ub2004/anaconda3/envs/gh_deepspeed/lib/python3.11/site-packages/torch/distributed/launch.py", line 176, in launch run(args) File "/home/ub2004/anaconda3/envs/gh_deepspeed/lib/python3.11/site-packages/torch/distributed/run.py", line 753, in run elastic_launch( File "/home/ub2004/anaconda3/envs/gh_deepspeed/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 132, in call return launch_agent(self._config, self._entrypoint, list(args)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ub2004/anaconda3/envs/gh_deepspeed/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 237, in launch_agent result = agent.run() ^^^^^^^^^^^ File "/home/ub2004/anaconda3/envs/gh_deepspeed/lib/python3.11/site-packages/torch/distributed/elastic/metrics/api.py", line 129, in wrapper result = f(*args, **kwargs) ^^^^^^^^^^^^^^^^^^ File "/home/ub2004/anaconda3/envs/gh_deepspeed/lib/python3.11/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = self._invoke_run(role) ^^^^^^^^^^^^^^^^^^^^^^ File "/home/ub2004/anaconda3/envs/gh_deepspeed/lib/python3.11/site-packages/torch/distributed/elastic/agent/server/api.py", line 844, in _invoke_run self._initialize_workers(self._worker_group) File "/home/ub2004/anaconda3/envs/gh_deepspeed/lib/python3.11/site-packages/torch/distributed/elastic/metrics/api.py", line 129, in wrapper result = f(*args, **kwargs) ^^^^^^^^^^^^^^^^^^ File "/home/ub2004/anaconda3/envs/gh_deepspeed/lib/python3.11/site-packages/torch/distributed/elastic/agent/server/api.py", line 681, in _initialize_workers worker_ids = self._start_workers(worker_group) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ub2004/anaconda3/envs/gh_deepspeed/lib/python3.11/site-packages/torch/distributed/elastic/metrics/api.py", line 129, in wrapper result = f(*args, **kwargs) ^^^^^^^^^^^^^^^^^^ File "/home/ub2004/anaconda3/envs/gh_deepspeed/lib/python3.11/site-packages/torch/distributed/elastic/agent/server/local_elastic_agent.py", line 271, in _start_workers self._pcontext = start_processes( ^^^^^^^^^^^^^^^^ File "/home/ub2004/anaconda3/envs/gh_deepspeed/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/init.py", line 207, in start_processes redirs = to_map(redirects, nprocs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ub2004/anaconda3/envs/gh_deepspeed/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 162, in to_map map[i] = val_or_map.get(i, Std.NONE) ^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'get'

SeekPoint avatar Apr 19 '23 12:04 SeekPoint