torchtitan
torchtitan copied to clipboard
train llama3 error
Root Cause (first observed failure):
[0]:
time : 2024-08-05_10:01:43
host : iZuf6ct0ygsd4zjh2lit8uZ
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 46669)
error_file: /tmp/torchelastic_i4d4ivao/none_jzj2c4lc/attempt_0/0/error.json
traceback : Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 355, in wrapper
return f(*args, **kwargs)
File "/ncluster/dushuai/torchtitan/train.py", line 207, in main
tokenizer = create_tokenizer(tokenizer_type, job_config.model.tokenizer_path)
File "/ncluster/dushuai/torchtitan/torchtitan/datasets/tokenizer/init.py", line 19, in create_tokenizer
return TikTokenizer(tokenizer_path)
File "/ncluster/dushuai/torchtitan/torchtitan/datasets/tokenizer/tiktoken.py", line 52, in init
mergeable_ranks = load_tiktoken_bpe(model_path)
File "/usr/local/lib/python3.10/dist-packages/tiktoken/load.py", line 148, in load_tiktoken_bpe
return {
File "/usr/local/lib/python3.10/dist-packages/tiktoken/load.py", line 149, in