Can't run examples on Windows 10
Hi, I've tried to run the examples, but I received this error.
(CodeLlama) PS C:\Users\marce\OneDrive\mah-docs\CodeLlama\codellama> python -m torch.distributed.run --nproc_per_node 1 example_infilling.py --ckpt_dir CodeLlama-7b-Python --tokenizer_path ./CodeLlama-7b-Python/tokenizer.model
NOTE: Redirects are currently not supported in Windows or MacOs.
[W C:\cb\pytorch_1000000000000\work\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:29500 (system error: 10049 - unknown error).
[W C:\cb\pytorch_1000000000000\work\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:29500 (system error: 10049 - unknown error).
[W C:\cb\pytorch_1000000000000\work\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:29500 (system error: 10049 - unknown error).
[W C:\cb\pytorch_1000000000000\work\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:29500 (system error: 10049 - unknown error).
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
Traceback (most recent call last):
File "C:\Users\marce\OneDrive\mah-docs\CodeLlama\codellama\example_infilling.py", line 79, in <module>
fire.Fire(main)
File "C:\ProgramData\anaconda3\envs\CodeLlama\Lib\site-packages\fire\core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\anaconda3\envs\CodeLlama\Lib\site-packages\fire\core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\anaconda3\envs\CodeLlama\Lib\site-packages\fire\core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\marce\OneDrive\mah-docs\CodeLlama\codellama\example_infilling.py", line 18, in main
generator = Llama.build(
^^^^^^^^^^^^
File "C:\Users\marce\OneDrive\mah-docs\CodeLlama\codellama\llama\generation.py", line 90, in build
checkpoint = torch.load(ckpt_path, map_location="cpu")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\anaconda3\envs\CodeLlama\Lib\site-packages\torch\serialization.py", line 815, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\anaconda3\envs\CodeLlama\Lib\site-packages\torch\serialization.py", line 1033, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_pickle.UnpicklingError: invalid load key, '<'.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 18284) of binary: C:\ProgramData\anaconda3\envs\CodeLlama\python.exe
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "C:\ProgramData\anaconda3\envs\CodeLlama\Lib\site-packages\torch\distributed\run.py", line 798, in <module>
main()
File "C:\ProgramData\anaconda3\envs\CodeLlama\Lib\site-packages\torch\distributed\elastic\multiprocessing\errors\__init__.py", line 346, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\anaconda3\envs\CodeLlama\Lib\site-packages\torch\distributed\run.py", line 794, in main
run(args)
File "C:\ProgramData\anaconda3\envs\CodeLlama\Lib\site-packages\torch\distributed\run.py", line 785, in run
elastic_launch(
File "C:\ProgramData\anaconda3\envs\CodeLlama\Lib\site-packages\torch\distributed\launcher\api.py", line 134, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\anaconda3\envs\CodeLlama\Lib\site-packages\torch\distributed\launcher\api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
example_infilling.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2023-08-28_12:39:51
host : DESKTOP-THP4I5R
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 18284)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs
UPDATE
I've made a mistake running the download.sh script. I've passed my email instead of the URL received from FB.
Did your issue resolved? I am unable to run on windows 10 as well. I am getting "Distributed package doesnt have NCCL built-in error"
@manoj21192 This will work on windows
temperature = 0
top_p = 0
max_seq_len = 4096
max_batch_size = 1
max_gen_len = None
num_of_worlds = 1
torch.distributed.init_process_group(backend='gloo', init_method='tcp://localhost:23455', world_size=num_of_worlds, rank=0)
generator = Llama.build(
ckpt_dir="C:/AI/LLaMA2_Docker_FileSystem/codellama/CodeLlama-7b-Instruct",
tokenizer_path="C:/AI/LLaMA2_Docker_FileSystem/codellama/CodeLlama-7b-Instruct/tokenizer.model",
max_seq_len=max_seq_len,
max_batch_size=max_batch_size,
model_parallel_size = num_of_worlds
)
UPDATE
I've made a mistake running the download.sh script. I've passed my email instead of the URL received from FB.
Thank you! I can reproduce this. I at first entered my email and then noticed my error and entered the correct URL when running download.sh, but loading was still not possible.
I cloned the repository again, entered the correct URL on first try and then it worked.
What mistake am I making here? from typing import Optional
import fire
from llama import Llama
def main( ckpt_dir: "D:\pathto\codellama\CodeLlama-7b", tokenizer_path: "D:\pathto\codellama\CodeLlama-7b\tokenizer.model", temperature: float = 0.2, top_p: float = 0.9, max_seq_len: int = 256, max_batch_size: int = 4, max_gen_len: Optional[int] = None, ): generator = Llama.build( ckpt_dir=ckpt_dir, tokenizer_path=tokenizer_path, max_seq_len=max_seq_len, max_batch_size=max_batch_size, ) "
I Am getting this error: "
D:\path2\codellama>python example_completion.py
ERROR: The function received no value for the required argument: ckpt_dir
Usage: example_completion.py CKPT_DIR TOKENIZER_PATH
For detailed information on this command, run: example_completion.py --help "
What mistake am I making here? from typing import Optional
import fire
from llama import Llama
def main( ckpt_dir: "D:\pathto\codellama\CodeLlama-7b", tokenizer_path: "D:\pathto\codellama\CodeLlama-7b\tokenizer.model", temperature: float = 0.2, top_p: float = 0.9, max_seq_len: int = 256, max_batch_size: int = 4, max_gen_len: Optional[int] = None, ): generator = Llama.build( ckpt_dir=ckpt_dir, tokenizer_path=tokenizer_path, max_seq_len=max_seq_len, max_batch_size=max_batch_size, ) "
I Am getting this error: "
D:\path2\codellama>python example_completion.py ERROR: The function received no value for the required argument: ckpt_dir Usage: example_completion.py CKPT_DIR TOKENIZER_PATH optional flags: --temperature | --top_p | --max_seq_len | --max_batch_size | --max_gen_len
For detailed information on this command, run: example_completion.py --help "
@bronzwikgk
Based on the code and error message you've provided, here are some issues I've identified:
- The type hints in the function arguments are actually string literals, which is incorrect syntax for Python.
- The paths should be properly escaped or defined as raw strings.
Here's a revised version of the code:
from typing import Optional
import fire
from llama import Llama
def main(
ckpt_dir: str = r"D:\pathto\codellama\CodeLlama-7b",
tokenizer_path: str = r"D:\pathto\codellama\CodeLlama-7b\tokenizer.model",
temperature: float = 0.2,
top_p: float = 0.9,
max_seq_len: int = 256,
max_batch_size: int = 4,
max_gen_len: Optional[int] = None,
):
generator = Llama.build(
ckpt_dir=ckpt_dir,
tokenizer_path=tokenizer_path,
max_seq_len=max_seq_len,
max_batch_size=max_batch_size,
)
if __name__ == "__main__":
fire.Fire(main)
- Fixed the type hints for
ckpt_dirandtokenizer_pathto bestr. - Used raw string literals for the Windows paths (by prefixing the string with an
r), which allow for backslashes to be interpreted correctly. - Added
if __name__ == "__main__": fire.Fire(main)to run the function when the script is executed.
Try running the updated code and see if the error persists.
Thanks, Moved One step ahead.
Getting this error now: {{
Traceback (most recent call last):
File "D:\shunyadotek\codellama\example_completion.py", line 55, in
torch.distributed.init_process_group(backend='gloo', init_method='tcp://localhost:23455', world_size=num_of_worlds, rank=0)
@bronzwikgk I don't see this line in your code : torch.distributed.init_process_group(backend='gloo', init_method='tcp://localhost:23455', world_size=num_of_worlds, rank=0)
Are you sure you have it in your code? See my answer with the full code with this line, few answers above.
@bronzwikgk Right, I see that you are using torch.distributed.init_process_group("nccl") nccl is for linux only, use my example above.