Unable to run example.py
I am running torchrun --nproc_per_node 1 example.py --ckpt_dir ./7B/ --tokenizer_path ./tokenizer.model
and my output is
NOTE: Redirects are currently not supported in Windows or MacOs.
Traceback (most recent call last):
File "/opt/homebrew/bin/torchrun", line 33, in <module>
sys.exit(load_entry_point('torch==1.13.0a0+git49444c3', 'console_scripts', 'torchrun')())
File "/opt/homebrew/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
return f(*args, **kwargs)
File "/opt/homebrew/lib/python3.11/site-packages/torch/distributed/run.py", line 762, in main
File "/opt/homebrew/lib/python3.11/site-packages/torch/distributed/run.py", line 753, in run
File "/opt/homebrew/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/opt/homebrew/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 237, in launch_agent
result = agent.run()
File "/opt/homebrew/lib/python3.11/site-packages/torch/distributed/elastic/metrics/api.py", line 129, in wrapper
result = f(*args, **kwargs)
File "/opt/homebrew/lib/python3.11/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run
result = self._invoke_run(role)
File "/opt/homebrew/lib/python3.11/site-packages/torch/distributed/elastic/agent/server/api.py", line 844, in _invoke_run
File "/opt/homebrew/lib/python3.11/site-packages/torch/distributed/elastic/metrics/api.py", line 129, in wrapper
result = f(*args, **kwargs)
File "/opt/homebrew/lib/python3.11/site-packages/torch/distributed/elastic/agent/server/api.py", line 681, in _initialize_workers
worker_ids = self._start_workers(worker_group)
File "/opt/homebrew/lib/python3.11/site-packages/torch/distributed/elastic/metrics/api.py", line 129, in wrapper
result = f(*args, **kwargs)
File "/opt/homebrew/lib/python3.11/site-packages/torch/distributed/elastic/agent/server/local_elastic_agent.py", line 271, in _start_workers
self._pcontext = start_processes(
File "/opt/homebrew/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/__init__.py", line 207, in start_processes
redirs = to_map(redirects, nprocs)
File "/opt/homebrew/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 162, in to_map
map[i] = val_or_map.get(i, Std.NONE)
AttributeError: 'NoneType' object has no attribute 'get'
Any idea what's happening here?
In Windows, I met the same problem.But in Ubuntu, it's okay.
You need to provide more information about your OS, python, the libraries and their versions and your GPU.
@neuhaus Thank you for suggesting that. Here is the information:
MacOS Ventura 13.2.1 (22D68) with M1 Pro, 16 core GPU, 16 GB RAM
Python 3.11.2
Here's my output from pip list
Same issue, bumping 👍 Macosx, Monterey, M1Pro 32gb
To avoid this issue, downgrade python. try 3.7~ (use a version manager like asdf, rtx). then make sure to install the requirements using the command.
the same quesiton
any solution found? I'm facing the same issue in Windows
Are you still seeing this issue on Llama 2?
Same here, I am using a MacBook M1, I will try to downgrade my python to 3.7 to see.