axolotl
axolotl copied to clipboard
Distributed Timeout during Dataset Tokenization
Please check that this issue hasn't been reported before.
- [x] I searched previous Bug Reports didn't find any similar reports.
Expected Behavior
Shouldn't crash or slow down at this point and should tokenize without problem.
Current behaviour
it looks like from 48000 -> 48612 it goes wrong. only 612 samples were tokenized. this happens over and over again multiple times where it doesn't jump by +1000 but less.
Error is triggered:
Tokenizing Prompts (num_proc=64): 92%|█████████▏| 328986/359152 [29:58<12:05, 41.59 examples/s][rank1]:[W516 14:34:56.853490829 socket.cpp:464] [c10d] waitForInput: poll for socket SocketImpl(fd=79, addr=[localhost]:41170, remote=[localhost]:29500) returned 0, likely a timeout
[rank1]:[W516 14:34:56.854597329 socket.cpp:489] [c10d] waitForInput: socket SocketImpl(fd=79, addr=[localhost]:41170, remote=[localhost]:29500) timed out after 1800000ms
[rank1]: Traceback (most recent call last):
[rank1]: File "<frozen runpy>", line 198, in _run_module_as_main
[rank1]: File "<frozen runpy>", line 88, in _run_code
[rank1]: File "/home/casper/miniconda3/envs/train/lib/python3.11/site-packages/axolotl/cli/train.py", line 124, in <module>
[rank1]: fire.Fire(do_cli)
[rank1]: File "/home/casper/miniconda3/envs/train/lib/python3.11/site-packages/fire/core.py", line 135, in Fire
[rank1]: component_trace = _Fire(component, args, parsed_flag_args, context, name)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/home/casper/miniconda3/envs/train/lib/python3.11/site-packages/fire/core.py", line 468, in _Fire
[rank1]: component, remaining_args = _CallAndUpdateTrace(
[rank1]: ^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/home/casper/miniconda3/envs/train/lib/python3.11/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
[rank1]: component = fn(*varargs, **kwargs)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/home/casper/miniconda3/envs/train/lib/python3.11/site-packages/axolotl/cli/train.py", line 98, in do_cli
[rank1]: return do_train(parsed_cfg, parsed_cli_args)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/home/casper/miniconda3/envs/train/lib/python3.11/site-packages/axolotl/cli/train.py", line 52, in do_train
[rank1]: dataset_meta = load_datasets(cfg=cfg, cli_args=cli_args)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/home/casper/miniconda3/envs/train/lib/python3.11/site-packages/axolotl/common/datasets.py", line 75, in load_datasets
[rank1]: train_dataset, eval_dataset, total_num_steps, prompters = prepare_dataset(
[rank1]: ^^^^^^^^^^^^^^^^
[rank1]: File "/home/casper/miniconda3/envs/train/lib/python3.11/site-packages/axolotl/utils/data/utils.py", line 39, in wrapper
[rank1]: return func(*args, **kwargs)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/home/casper/miniconda3/envs/train/lib/python3.11/site-packages/axolotl/utils/data/sft.py", line 69, in prepare_dataset
[rank1]: with zero_first(is_local_main_process()):
[rank1]: File "/home/casper/miniconda3/envs/train/lib/python3.11/contextlib.py", line 137, in __enter__
[rank1]: return next(self.gen)
[rank1]: ^^^^^^^^^^^^^^
[rank1]: File "/home/casper/miniconda3/envs/train/lib/python3.11/site-packages/axolotl/utils/distributed.py", line 118, in zero_first
[rank1]: barrier()
[rank1]: File "/home/casper/miniconda3/envs/train/lib/python3.11/site-packages/axolotl/utils/distributed.py", line 69, in barrier
[rank1]: dist.barrier()
[rank1]: File "/home/casper/miniconda3/envs/train/lib/python3.11/site-packages/torch/distributed/c10d_logger.py", line 81, in wrapper
[rank1]: return func(*args, **kwargs)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/home/casper/miniconda3/envs/train/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py", line 4551, in barrier
[rank1]: work = group.barrier(opts=opts)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: torch.distributed.DistBackendError: [1] is setting up NCCL communicator and retrieving ncclUniqueId from [0] via c10d key-value store by key '0', but store->get('0') got error: wait timeout after 1800000ms, keys: /default_pg/0//cuda//0
Log of tokenizing:
Tokenizing Prompts (num_proc=64): 0%| | 0/359152 [00:00<?, ? examples/s]
Tokenizing Prompts (num_proc=64): 0%| | 1000/359152 [00:18<1:53:00, 52.82 examples/s]
Tokenizing Prompts (num_proc=64): 0%| | 1000/359152 [00:38<1:53:00, 52.82 examples/s]
Tokenizing Prompts (num_proc=64): 1%| | 2000/359152 [00:41<2:06:23, 47.10 examples/s]
Tokenizing Prompts (num_proc=64): 1%| | 3000/359152 [00:46<1:22:37, 71.83 examples/s]
Tokenizing Prompts (num_proc=64): 1%| | 4000/359152 [00:52<1:02:47, 94.27 examples/s]
Tokenizing Prompts (num_proc=64): 1%|▏ | 5000/359152 [01:00<56:04, 105.27 examples/s]
Tokenizing Prompts (num_proc=64): 2%|▏ | 6000/359152 [01:05<47:12, 124.67 examples/s]
Tokenizing Prompts (num_proc=64): 2%|▏ | 7000/359152 [01:05<32:29, 180.67 examples/s]
Tokenizing Prompts (num_proc=64): 2%|▏ | 7000/359152 [01:18<32:29, 180.67 examples/s]
Tokenizing Prompts (num_proc=64): 2%|▏ | 8000/359152 [01:19<48:28, 120.72 examples/s]
Tokenizing Prompts (num_proc=64): 3%|▎ | 9000/359152 [01:21<37:02, 157.57 examples/s]
Tokenizing Prompts (num_proc=64): 3%|▎ | 10000/359152 [01:23<29:09, 199.55 examples/s]
Tokenizing Prompts (num_proc=64): 3%|▎ | 11000/359152 [01:28<28:52, 200.99 examples/s]
Tokenizing Prompts (num_proc=64): 3%|▎ | 12000/359152 [01:31<23:54, 241.99 examples/s]
Tokenizing Prompts (num_proc=64): 4%|▎ | 13000/359152 [01:33<20:16, 284.55 examples/s]
Tokenizing Prompts (num_proc=64): 4%|▍ | 14000/359152 [01:34<16:39, 345.16 examples/s]
Tokenizing Prompts (num_proc=64): 4%|▍ | 15000/359152 [01:40<22:33, 254.23 examples/s]
Tokenizing Prompts (num_proc=64): 4%|▍ | 16000/359152 [01:47<27:27, 208.30 examples/s]
Tokenizing Prompts (num_proc=64): 5%|▍ | 17000/359152 [01:48<19:47, 288.13 examples/s]
Tokenizing Prompts (num_proc=64): 5%|▌ | 18000/359152 [01:53<23:29, 242.05 examples/s]
Tokenizing Prompts (num_proc=64): 5%|▌ | 19000/359152 [01:55<19:21, 292.78 examples/s]
Tokenizing Prompts (num_proc=64): 6%|▌ | 20000/359152 [01:56<15:35, 362.34 examples/s]
Tokenizing Prompts (num_proc=64): 6%|▌ | 21000/359152 [02:04<23:35, 238.86 examples/s]
Tokenizing Prompts (num_proc=64): 6%|▌ | 22000/359152 [02:09<25:44, 218.26 examples/s]
Tokenizing Prompts (num_proc=64): 6%|▋ | 23000/359152 [02:09<18:16, 306.48 examples/s]
Tokenizing Prompts (num_proc=64): 7%|▋ | 24000/359152 [02:12<17:13, 324.15 examples/s]
Tokenizing Prompts (num_proc=64): 7%|▋ | 25000/359152 [02:13<13:42, 406.14 examples/s]
Tokenizing Prompts (num_proc=64): 7%|▋ | 26000/359152 [02:19<19:45, 281.10 examples/s]
Tokenizing Prompts (num_proc=64): 8%|▊ | 27000/359152 [02:22<18:37, 297.29 examples/s]
Tokenizing Prompts (num_proc=64): 8%|▊ | 28000/359152 [02:22<13:16, 415.80 examples/s]
Tokenizing Prompts (num_proc=64): 8%|▊ | 29000/359152 [02:33<27:09, 202.61 examples/s]
Tokenizing Prompts (num_proc=64): 8%|▊ | 30000/359152 [02:34<20:26, 268.45 examples/s]
Tokenizing Prompts (num_proc=64): 9%|▊ | 31000/359152 [02:38<20:27, 267.38 examples/s]
Tokenizing Prompts (num_proc=64): 9%|▉ | 32000/359152 [02:43<22:04, 247.02 examples/s]
Tokenizing Prompts (num_proc=64): 9%|▉ | 34000/359152 [02:43<12:42, 426.24 examples/s]
Tokenizing Prompts (num_proc=64): 10%|▉ | 35000/359152 [02:44<11:02, 489.44 examples/s]
Tokenizing Prompts (num_proc=64): 10%|█ | 36000/359152 [02:46<09:57, 540.70 examples/s]
Tokenizing Prompts (num_proc=64): 10%|█ | 37000/359152 [02:55<21:27, 250.12 examples/s]
Tokenizing Prompts (num_proc=64): 11%|█ | 38000/359152 [02:56<15:48, 338.73 examples/s]
Tokenizing Prompts (num_proc=64): 11%|█ | 39000/359152 [03:01<18:50, 283.07 examples/s]
Tokenizing Prompts (num_proc=64): 11%|█ | 40000/359152 [03:01<14:20, 370.79 examples/s]
Tokenizing Prompts (num_proc=64): 11%|█▏ | 41000/359152 [03:06<16:54, 313.76 examples/s]
Tokenizing Prompts (num_proc=64): 12%|█▏ | 42000/359152 [03:07<13:05, 403.98 examples/s]
Tokenizing Prompts (num_proc=64): 12%|█▏ | 43000/359152 [03:08<11:14, 468.90 examples/s]
Tokenizing Prompts (num_proc=64): 12%|█▏ | 44000/359152 [03:12<15:00, 350.15 examples/s]
Tokenizing Prompts (num_proc=64): 13%|█▎ | 45000/359152 [03:19<20:45, 252.16 examples/s]
Tokenizing Prompts (num_proc=64): 13%|█▎ | 46000/359152 [03:29<29:25, 177.40 examples/s]
Tokenizing Prompts (num_proc=64): 13%|█▎ | 47000/359152 [03:29<21:30, 241.89 examples/s]
Tokenizing Prompts (num_proc=64): 13%|█▎ | 48000/359152 [03:32<18:50, 275.25 examples/s]
Tokenizing Prompts (num_proc=64): 14%|█▎ | 48612/359152 [03:32<15:41, 329.74 examples/s]
Tokenizing Prompts (num_proc=64): 14%|█▍ | 49612/359152 [03:33<12:24, 415.68 examples/s]
Tokenizing Prompts (num_proc=64): 14%|█▍ | 50612/359152 [03:33<08:42, 590.38 examples/s]
Tokenizing Prompts (num_proc=64): 14%|█▍ | 51612/359152 [03:35<07:56, 645.03 examples/s]
Tokenizing Prompts (num_proc=64): 15%|█▍ | 52612/359152 [03:35<06:31, 782.29 examples/s]
Tokenizing Prompts (num_proc=64): 15%|█▍ | 53612/359152 [03:37<07:49, 651.09 examples/s]
Tokenizing Prompts (num_proc=64): 15%|█▌ | 54612/359152 [03:38<06:19, 802.56 examples/s]
Tokenizing Prompts (num_proc=64): 15%|█▌ | 55612/359152 [03:39<05:26, 930.19 examples/s]
Tokenizing Prompts (num_proc=64): 16%|█▌ | 56612/359152 [03:41<07:51, 641.77 examples/s]
Tokenizing Prompts (num_proc=64): 16%|█▌ | 57612/359152 [03:43<07:49, 641.81 examples/s]
Tokenizing Prompts (num_proc=64): 16%|█▋ | 58612/359152 [03:46<09:28, 528.46 examples/s]
Tokenizing Prompts (num_proc=64): 17%|█▋ | 59612/359152 [03:46<07:33, 660.57 examples/s]
Tokenizing Prompts (num_proc=64): 17%|█▋ | 60612/359152 [03:49<08:47, 565.46 examples/s]
Tokenizing Prompts (num_proc=64): 17%|█▋ | 61612/359152 [03:51<09:36, 515.99 examples/s]
Tokenizing Prompts (num_proc=64): 17%|█▋ | 62612/359152 [03:53<09:58, 495.73 examples/s]
Tokenizing Prompts (num_proc=64): 18%|█▊ | 63612/359152 [03:54<08:55, 551.73 examples/s]
Tokenizing Prompts (num_proc=64): 18%|█▊ | 64224/359152 [03:56<10:08, 485.06 examples/s]
Tokenizing Prompts (num_proc=64): 18%|█▊ | 65224/359152 [04:02<15:45, 310.77 examples/s]
Tokenizing Prompts (num_proc=64): 18%|█▊ | 66224/359152 [04:03<12:15, 398.43 examples/s]
Tokenizing Prompts (num_proc=64): 19%|█▊ | 67224/359152 [04:04<10:31, 462.45 examples/s]
Tokenizing Prompts (num_proc=64): 19%|█▉ | 68224/359152 [04:06<09:55, 488.50 examples/s]
Tokenizing Prompts (num_proc=64): 19%|█▉ | 69224/359152 [04:11<14:32, 332.26 examples/s]
Tokenizing Prompts (num_proc=64): 20%|█▉ | 70224/359152 [04:17<18:49, 255.91 examples/s]
Tokenizing Prompts (num_proc=64): 20%|█▉ | 71224/359152 [04:25<24:05, 199.16 examples/s]
Tokenizing Prompts (num_proc=64): 20%|██ | 72224/359152 [04:25<17:09, 278.59 examples/s]
Tokenizing Prompts (num_proc=64): 20%|██ | 73224/359152 [04:31<20:56, 227.55 examples/s]
Tokenizing Prompts (num_proc=64): 21%|██ | 74224/359152 [04:32<15:07, 313.99 examples/s]
Tokenizing Prompts (num_proc=64): 21%|██ | 75224/359152 [04:32<11:35, 408.16 examples/s]
Tokenizing Prompts (num_proc=64): 21%|██ | 76224/359152 [04:36<12:44, 369.93 examples/s]
Tokenizing Prompts (num_proc=64): 22%|██▏ | 77224/359152 [04:41<16:07, 291.35 examples/s]
Tokenizing Prompts (num_proc=64): 22%|██▏ | 78224/359152 [04:43<14:17, 327.43 examples/s]
Tokenizing Prompts (num_proc=64): 22%|██▏ | 80224/359152 [04:44<09:07, 509.11 examples/s]
Tokenizing Prompts (num_proc=64): 23%|██▎ | 81224/359152 [04:50<13:12, 350.77 examples/s]
Tokenizing Prompts (num_proc=64): 23%|██▎ | 82224/359152 [04:52<11:45, 392.79 examples/s]
Tokenizing Prompts (num_proc=64): 23%|██▎ | 83224/359152 [04:55<13:05, 351.07 examples/s]
Tokenizing Prompts (num_proc=64): 23%|██▎ | 84224/359152 [04:59<14:39, 312.68 examples/s]
Tokenizing Prompts (num_proc=64): 24%|██▎ | 85224/359152 [05:01<12:21, 369.30 examples/s]
Tokenizing Prompts (num_proc=64): 24%|██▍ | 86224/359152 [05:06<15:58, 284.84 examples/s]
Tokenizing Prompts (num_proc=64): 24%|██▍ | 87224/359152 [05:07<12:09, 372.99 examples/s]
Tokenizing Prompts (num_proc=64): 25%|██▍ | 88224/359152 [05:09<10:49, 417.08 examples/s]
Tokenizing Prompts (num_proc=64): 25%|██▍ | 89224/359152 [05:13<13:12, 340.80 examples/s]
Tokenizing Prompts (num_proc=64): 25%|██▌ | 90224/359152 [05:13<09:52, 453.60 examples/s]
Tokenizing Prompts (num_proc=64): 25%|██▌ | 91224/359152 [05:19<14:28, 308.40 examples/s]
Tokenizing Prompts (num_proc=64): 26%|██▌ | 92224/359152 [05:20<11:21, 391.89 examples/s]
Tokenizing Prompts (num_proc=64): 26%|██▌ | 93224/359152 [05:21<08:31, 519.79 examples/s]
Tokenizing Prompts (num_proc=64): 26%|██▌ | 93836/359152 [05:21<07:10, 616.80 examples/s]
Tokenizing Prompts (num_proc=64): 26%|██▋ | 94836/359152 [05:21<05:24, 813.60 examples/s]
Tokenizing Prompts (num_proc=64): 27%|██▋ | 95836/359152 [05:22<04:29, 978.37 examples/s]
Tokenizing Prompts (num_proc=64): 27%|██▋ | 96836/359152 [05:26<08:24, 520.27 examples/s]
Tokenizing Prompts (num_proc=64): 27%|██▋ | 97836/359152 [05:29<10:09, 429.01 examples/s]
Tokenizing Prompts (num_proc=64): 28%|██▊ | 98836/359152 [05:31<10:02, 431.85 examples/s]
Tokenizing Prompts (num_proc=64): 28%|██▊ | 99836/359152 [05:33<08:45, 493.80 examples/s]
Tokenizing Prompts (num_proc=64): 28%|██▊ | 100448/359152 [05:37<12:52, 334.85 examples/s]
Tokenizing Prompts (num_proc=64): 28%|██▊ | 101448/359152 [05:38<10:04, 426.43 examples/s]
Tokenizing Prompts (num_proc=64): 29%|██▊ | 102448/359152 [05:38<07:00, 610.22 examples/s]
Tokenizing Prompts (num_proc=64): 29%|██▉ | 103448/359152 [05:39<06:03, 703.24 examples/s]
Tokenizing Prompts (num_proc=64): 29%|██▉ | 104448/359152 [05:45<12:47, 331.94 examples/s]
Tokenizing Prompts (num_proc=64): 29%|██▉ | 105448/359152 [05:52<17:44, 238.28 examples/s]
Tokenizing Prompts (num_proc=64): 30%|██▉ | 106448/359152 [06:00<22:15, 189.29 examples/s]
Tokenizing Prompts (num_proc=64): 30%|██▉ | 107448/359152 [06:01<16:27, 254.93 examples/s]
Tokenizing Prompts (num_proc=64): 30%|███ | 108448/359152 [06:02<12:58, 322.24 examples/s]
Tokenizing Prompts (num_proc=64): 30%|███ | 109448/359152 [06:06<13:58, 297.83 examples/s]
Tokenizing Prompts (num_proc=64): 31%|███ | 110448/359152 [06:09<13:07, 315.96 examples/s]
Tokenizing Prompts (num_proc=64): 31%|███ | 111448/359152 [06:10<10:24, 396.65 examples/s]
Tokenizing Prompts (num_proc=64): 31%|███ | 112060/359152 [06:10<09:05, 452.80 examples/s]
Tokenizing Prompts (num_proc=64): 31%|███▏ | 113060/359152 [06:10<06:21, 644.61 examples/s]
Tokenizing Prompts (num_proc=64): 32%|███▏ | 114060/359152 [06:20<16:55, 241.40 examples/s]
Tokenizing Prompts (num_proc=64): 32%|███▏ | 114672/359152 [06:22<15:14, 267.31 examples/s]
Tokenizing Prompts (num_proc=64): 32%|███▏ | 115672/359152 [06:29<20:16, 200.10 examples/s]
Tokenizing Prompts (num_proc=64): 32%|███▏ | 116672/359152 [06:30<14:32, 277.95 examples/s]
Tokenizing Prompts (num_proc=64): 33%|███▎ | 117672/359152 [06:32<13:00, 309.47 examples/s]
Tokenizing Prompts (num_proc=64): 33%|███▎ | 118672/359152 [06:38<15:35, 257.08 examples/s]
Tokenizing Prompts (num_proc=64): 33%|███▎ | 119672/359152 [06:39<12:46, 312.34 examples/s]
Tokenizing Prompts (num_proc=64): 34%|███▎ | 120672/359152 [06:42<12:20, 322.25 examples/s]
Tokenizing Prompts (num_proc=64): 34%|███▍ | 121672/359152 [06:52<20:33, 192.54 examples/s]
Tokenizing Prompts (num_proc=64): 34%|███▍ | 122672/359152 [06:53<15:41, 251.07 examples/s]
Tokenizing Prompts (num_proc=64): 34%|███▍ | 123672/359152 [06:55<12:40, 309.46 examples/s]
Tokenizing Prompts (num_proc=64): 35%|███▍ | 124672/359152 [06:55<09:12, 424.12 examples/s]
Tokenizing Prompts (num_proc=64): 35%|███▍ | 125672/359152 [06:59<11:16, 345.28 examples/s]
Tokenizing Prompts (num_proc=64): 35%|███▌ | 126672/359152 [07:01<09:47, 396.03 examples/s]
Tokenizing Prompts (num_proc=64): 36%|███▌ | 127672/359152 [07:01<07:06, 543.29 examples/s]
Tokenizing Prompts (num_proc=64): 36%|███▌ | 128672/359152 [07:07<12:17, 312.62 examples/s]
Tokenizing Prompts (num_proc=64): 36%|███▌ | 129672/359152 [07:09<10:21, 369.33 examples/s]
Tokenizing Prompts (num_proc=64): 37%|███▋ | 131672/359152 [07:13<08:36, 440.76 examples/s]
Tokenizing Prompts (num_proc=64): 37%|███▋ | 132284/359152 [07:23<17:44, 213.21 examples/s]
Tokenizing Prompts (num_proc=64): 37%|███▋ | 133284/359152 [07:24<14:21, 262.15 examples/s]
Tokenizing Prompts (num_proc=64): 37%|███▋ | 134284/359152 [07:26<12:04, 310.47 examples/s]
Tokenizing Prompts (num_proc=64): 38%|███▊ | 135284/359152 [07:29<12:10, 306.59 examples/s]
Tokenizing Prompts (num_proc=64): 38%|███▊ | 136284/359152 [07:31<09:50, 377.28 examples/s]
Tokenizing Prompts (num_proc=64): 38%|███▊ | 136896/359152 [07:31<08:46, 422.16 examples/s]
Tokenizing Prompts (num_proc=64): 38%|███▊ | 137896/359152 [07:33<07:57, 463.43 examples/s]
Tokenizing Prompts (num_proc=64): 39%|███▊ | 138896/359152 [07:35<07:29, 489.76 examples/s]
Tokenizing Prompts (num_proc=64): 39%|███▉ | 139896/359152 [07:37<07:33, 483.59 examples/s]
Tokenizing Prompts (num_proc=64): 39%|███▉ | 140896/359152 [07:41<10:05, 360.59 examples/s]
Tokenizing Prompts (num_proc=64): 39%|███▉ | 141508/359152 [07:42<09:18, 389.45 examples/s]
Tokenizing Prompts (num_proc=64): 40%|███▉ | 142508/359152 [07:43<06:51, 526.57 examples/s]
Tokenizing Prompts (num_proc=64): 40%|███▉ | 143508/359152 [07:52<15:02, 238.97 examples/s]
Tokenizing Prompts (num_proc=64): 40%|████ | 144508/359152 [07:55<13:49, 258.87 examples/s]
Tokenizing Prompts (num_proc=64): 41%|████ | 145508/359152 [07:57<11:32, 308.73 examples/s]
Tokenizing Prompts (num_proc=64): 41%|████ | 146508/359152 [08:02<13:42, 258.54 examples/s]
Tokenizing Prompts (num_proc=64): 41%|████ | 147508/359152 [08:03<10:28, 336.87 examples/s]
Tokenizing Prompts (num_proc=64): 41%|████▏ | 148508/359152 [08:17<21:24, 163.98 examples/s]
Tokenizing Prompts (num_proc=64): 42%|████▏ | 149508/359152 [08:17<15:23, 227.00 examples/s]
Tokenizing Prompts (num_proc=64): 42%|████▏ | 150508/359152 [08:20<13:56, 249.40 examples/s]
Tokenizing Prompts (num_proc=64): 42%|████▏ | 151508/359152 [08:25<14:17, 242.11 examples/s]
Tokenizing Prompts (num_proc=64): 42%|████▏ | 152508/359152 [08:33<18:45, 183.58 examples/s]
Tokenizing Prompts (num_proc=64): 43%|████▎ | 153508/359152 [08:39<19:38, 174.46 examples/s]
Tokenizing Prompts (num_proc=64): 43%|████▎ | 154508/359152 [08:43<17:01, 200.34 examples/s]
Tokenizing Prompts (num_proc=64): 43%|████▎ | 155508/359152 [08:52<21:38, 156.86 examples/s]
Tokenizing Prompts (num_proc=64): 43%|████▎ | 156120/359152 [08:54<19:16, 175.62 examples/s]
Tokenizing Prompts (num_proc=64): 44%|████▎ | 157120/359152 [08:57<15:42, 214.44 examples/s]
Tokenizing Prompts (num_proc=64): 44%|████▍ | 158120/359152 [09:01<15:00, 223.26 examples/s]
Tokenizing Prompts (num_proc=64): 44%|████▍ | 159120/359152 [09:04<13:03, 255.39 examples/s]
Tokenizing Prompts (num_proc=64): 45%|████▍ | 160120/359152 [09:05<10:07, 327.60 examples/s]
Tokenizing Prompts (num_proc=64): 45%|████▍ | 161120/359152 [09:05<07:39, 430.73 examples/s]
Tokenizing Prompts (num_proc=64): 45%|████▌ | 162120/359152 [09:12<12:11, 269.43 examples/s]
Tokenizing Prompts (num_proc=64): 45%|████▌ | 163120/359152 [09:13<09:25, 346.82 examples/s]
Tokenizing Prompts (num_proc=64): 46%|████▌ | 164120/359152 [09:15<08:18, 391.61 examples/s]
Tokenizing Prompts (num_proc=64): 46%|████▌ | 165120/359152 [09:16<06:26, 501.48 examples/s]
Tokenizing Prompts (num_proc=64): 46%|████▋ | 166120/359152 [09:18<06:52, 468.13 examples/s]
Tokenizing Prompts (num_proc=64): 47%|████▋ | 167120/359152 [09:20<06:33, 487.53 examples/s]
Tokenizing Prompts (num_proc=64): 47%|████▋ | 168120/359152 [09:22<06:36, 481.24 examples/s]
Tokenizing Prompts (num_proc=64): 47%|████▋ | 168732/359152 [09:24<07:02, 450.67 examples/s]
Tokenizing Prompts (num_proc=64): 47%|████▋ | 169732/359152 [09:30<10:39, 296.00 examples/s]
Tokenizing Prompts (num_proc=64): 48%|████▊ | 170732/359152 [09:30<07:41, 407.87 examples/s]
Tokenizing Prompts (num_proc=64): 48%|████▊ | 171732/359152 [09:33<07:40, 407.02 examples/s]
Tokenizing Prompts (num_proc=64): 48%|████▊ | 172344/359152 [09:33<07:02, 441.78 examples/s]
Tokenizing Prompts (num_proc=64): 48%|████▊ | 173344/359152 [09:37<08:29, 364.80 examples/s]
Tokenizing Prompts (num_proc=64): 49%|████▊ | 174344/359152 [09:39<07:52, 391.30 examples/s]
Tokenizing Prompts (num_proc=64): 49%|████▉ | 175344/359152 [09:48<13:28, 227.21 examples/s]
Tokenizing Prompts (num_proc=64): 49%|████▉ | 176344/359152 [09:50<11:45, 259.29 examples/s]
Tokenizing Prompts (num_proc=64): 49%|████▉ | 177344/359152 [09:54<11:25, 265.06 examples/s]
Tokenizing Prompts (num_proc=64): 50%|████▉ | 178344/359152 [10:05<18:27, 163.28 examples/s]
Tokenizing Prompts (num_proc=64): 50%|████▉ | 178956/359152 [10:07<15:58, 187.97 examples/s]
Tokenizing Prompts (num_proc=64): 50%|████▉ | 178956/359152 [10:18<15:58, 187.97 examples/s]
Tokenizing Prompts (num_proc=64): 50%|████▉ | 179568/359152 [10:19<25:34, 117.03 examples/s]
Tokenizing Prompts (num_proc=64): 50%|█████ | 180568/359152 [10:19<17:17, 172.07 examples/s]
Tokenizing Prompts (num_proc=64): 50%|█████ | 181180/359152 [10:23<16:44, 177.26 examples/s]
Tokenizing Prompts (num_proc=64): 51%|█████ | 182180/359152 [10:25<13:33, 217.58 examples/s]
Tokenizing Prompts (num_proc=64): 51%|█████ | 183180/359152 [10:30<13:24, 218.83 examples/s]
Tokenizing Prompts (num_proc=64): 51%|█████▏ | 184180/359152 [10:31<09:47, 297.82 examples/s]
Tokenizing Prompts (num_proc=64): 52%|█████▏ | 185180/359152 [10:31<06:50, 423.48 examples/s]
Tokenizing Prompts (num_proc=64): 52%|█████▏ | 186180/359152 [10:31<05:20, 540.22 examples/s]
Tokenizing Prompts (num_proc=64): 52%|█████▏ | 187180/359152 [10:32<04:33, 628.00 examples/s]
Tokenizing Prompts (num_proc=64): 52%|█████▏ | 188180/359152 [10:37<07:26, 382.68 examples/s]
Tokenizing Prompts (num_proc=64): 53%|█████▎ | 189180/359152 [10:48<13:58, 202.74 examples/s]
Tokenizing Prompts (num_proc=64): 53%|█████▎ | 189792/359152 [10:50<13:36, 207.52 examples/s]
Tokenizing Prompts (num_proc=64): 53%|█████▎ | 190792/359152 [10:56<14:27, 194.12 examples/s]
Tokenizing Prompts (num_proc=64): 53%|█████▎ | 191792/359152 [11:02<14:49, 188.24 examples/s]
Tokenizing Prompts (num_proc=64): 54%|█████▎ | 192792/359152 [11:05<12:59, 213.37 examples/s]
Tokenizing Prompts (num_proc=64): 54%|█████▍ | 193792/359152 [11:08<11:23, 242.02 examples/s]
Tokenizing Prompts (num_proc=64): 54%|█████▍ | 194792/359152 [11:17<15:33, 176.00 examples/s]
Tokenizing Prompts (num_proc=64): 55%|█████▍ | 195792/359152 [11:23<15:16, 178.32 examples/s]
Tokenizing Prompts (num_proc=64): 55%|█████▍ | 196792/359152 [11:30<16:58, 159.41 examples/s]
Tokenizing Prompts (num_proc=64): 55%|█████▌ | 197792/359152 [11:33<14:12, 189.34 examples/s]
Tokenizing Prompts (num_proc=64): 56%|█████▌ | 199792/359152 [11:40<11:41, 227.32 examples/s]
Tokenizing Prompts (num_proc=64): 56%|█████▌ | 200792/359152 [11:41<09:27, 279.29 examples/s]
Tokenizing Prompts (num_proc=64): 56%|█████▌ | 201792/359152 [11:58<09:23, 279.29 examples/s]
Tokenizing Prompts (num_proc=64): 56%|█████▋ | 202792/359152 [12:01<16:00, 162.86 examples/s]
Tokenizing Prompts (num_proc=64): 57%|█████▋ | 203792/359152 [12:10<17:38, 146.75 examples/s]
Tokenizing Prompts (num_proc=64): 57%|█████▋ | 205792/359152 [12:13<12:18, 207.77 examples/s]
Tokenizing Prompts (num_proc=64): 58%|█████▊ | 206792/359152 [12:14<09:54, 256.18 examples/s]
Tokenizing Prompts (num_proc=64): 58%|█████▊ | 207792/359152 [12:15<07:47, 323.73 examples/s]
Tokenizing Prompts (num_proc=64): 58%|█████▊ | 208792/359152 [12:22<10:28, 239.11 examples/s]
Tokenizing Prompts (num_proc=64): 58%|█████▊ | 209792/359152 [12:24<09:18, 267.58 examples/s]
Tokenizing Prompts (num_proc=64): 59%|█████▊ | 210404/359152 [12:26<08:56, 277.43 examples/s]
Tokenizing Prompts (num_proc=64): 59%|█████▊ | 210404/359152 [12:38<08:56, 277.43 examples/s]
Tokenizing Prompts (num_proc=64): 59%|█████▉ | 211404/359152 [12:42<18:16, 134.75 examples/s]
Tokenizing Prompts (num_proc=64): 59%|█████▉ | 212404/359152 [12:50<18:02, 135.59 examples/s]
Tokenizing Prompts (num_proc=64): 59%|█████▉ | 213404/359152 [13:00<19:49, 122.55 examples/s]
Tokenizing Prompts (num_proc=64): 60%|█████▉ | 214404/359152 [13:01<14:57, 161.30 examples/s]
Tokenizing Prompts (num_proc=64): 60%|█████▉ | 215016/359152 [13:06<15:16, 157.32 examples/s]
Tokenizing Prompts (num_proc=64): 60%|██████ | 215628/359152 [13:07<13:00, 183.86 examples/s]
Tokenizing Prompts (num_proc=64): 60%|██████ | 216628/359152 [13:08<09:36, 247.41 examples/s]
Tokenizing Prompts (num_proc=64): 60%|██████ | 217240/359152 [13:16<14:31, 162.85 examples/s]
Tokenizing Prompts (num_proc=64): 61%|██████ | 218240/359152 [13:18<10:18, 227.92 examples/s]
Tokenizing Prompts (num_proc=64): 61%|██████ | 219240/359152 [13:18<06:58, 334.34 examples/s]
Tokenizing Prompts (num_proc=64): 61%|██████▏ | 220240/359152 [13:24<09:13, 250.76 examples/s]
Tokenizing Prompts (num_proc=64): 62%|██████▏ | 221240/359152 [13:30<10:45, 213.65 examples/s]
Tokenizing Prompts (num_proc=64): 62%|██████▏ | 221852/359152 [13:30<08:28, 270.11 examples/s]
Tokenizing Prompts (num_proc=64): 62%|██████▏ | 222852/359152 [13:33<07:42, 294.57 examples/s]
Tokenizing Prompts (num_proc=64): 62%|██████▏ | 223464/359152 [13:35<07:47, 289.99 examples/s]
Tokenizing Prompts (num_proc=64): 62%|██████▏ | 224464/359152 [13:41<09:57, 225.24 examples/s]
Tokenizing Prompts (num_proc=64): 63%|██████▎ | 225464/359152 [13:42<06:54, 322.63 examples/s]
Tokenizing Prompts (num_proc=64): 63%|██████▎ | 226464/359152 [13:55<13:44, 160.89 examples/s]
Tokenizing Prompts (num_proc=64): 63%|██████▎ | 227464/359152 [13:55<09:32, 230.05 examples/s]
Tokenizing Prompts (num_proc=64): 64%|██████▎ | 228464/359152 [13:59<09:24, 231.36 examples/s]
Tokenizing Prompts (num_proc=64): 64%|██████▍ | 229464/359152 [14:01<07:49, 275.94 examples/s]
Tokenizing Prompts (num_proc=64): 64%|██████▍ | 230464/359152 [14:05<07:47, 275.44 examples/s]
Tokenizing Prompts (num_proc=64): 64%|██████▍ | 231464/359152 [14:09<07:48, 272.31 examples/s]
Tokenizing Prompts (num_proc=64): 65%|██████▍ | 232464/359152 [14:14<08:53, 237.25 examples/s]
Tokenizing Prompts (num_proc=64): 65%|██████▌ | 233464/359152 [14:16<07:35, 276.20 examples/s]
Tokenizing Prompts (num_proc=64): 65%|██████▌ | 234464/359152 [14:23<09:31, 218.11 examples/s]
Tokenizing Prompts (num_proc=64): 66%|██████▌ | 235464/359152 [14:24<07:13, 285.45 examples/s]
Tokenizing Prompts (num_proc=64): 66%|██████▌ | 235464/359152 [14:38<07:13, 285.45 examples/s]
Tokenizing Prompts (num_proc=64): 66%|██████▌ | 236464/359152 [14:38<13:45, 148.67 examples/s]
Tokenizing Prompts (num_proc=64): 66%|██████▌ | 237464/359152 [14:39<10:09, 199.60 examples/s]
Tokenizing Prompts (num_proc=64): 66%|██████▋ | 238076/359152 [14:44<10:56, 184.56 examples/s]
Tokenizing Prompts (num_proc=64): 67%|██████▋ | 239076/359152 [14:47<09:38, 207.52 examples/s]
Tokenizing Prompts (num_proc=64): 67%|██████▋ | 240076/359152 [14:54<10:58, 180.85 examples/s]
Tokenizing Prompts (num_proc=64): 67%|██████▋ | 241076/359152 [14:59<10:33, 186.40 examples/s]
Tokenizing Prompts (num_proc=64): 67%|██████▋ | 242076/359152 [15:00<07:20, 265.71 examples/s]
Tokenizing Prompts (num_proc=64): 68%|██████▊ | 243076/359152 [15:05<08:29, 228.01 examples/s]
Tokenizing Prompts (num_proc=64): 68%|██████▊ | 244076/359152 [15:11<08:56, 214.65 examples/s]
Tokenizing Prompts (num_proc=64): 68%|██████▊ | 245076/359152 [15:13<07:43, 245.87 examples/s]
Tokenizing Prompts (num_proc=64): 69%|██████▊ | 246076/359152 [15:21<09:45, 193.22 examples/s]
Tokenizing Prompts (num_proc=64): 69%|██████▉ | 247076/359152 [15:22<07:05, 263.30 examples/s]
Tokenizing Prompts (num_proc=64): 69%|██████▉ | 248076/359152 [15:29<09:14, 200.16 examples/s]
Tokenizing Prompts (num_proc=64): 69%|██████▉ | 249076/359152 [15:47<16:04, 114.18 examples/s]
Tokenizing Prompts (num_proc=64): 70%|██████▉ | 249688/359152 [15:51<15:14, 119.64 examples/s]
Tokenizing Prompts (num_proc=64): 70%|██████▉ | 250688/359152 [15:56<13:01, 138.87 examples/s]
Tokenizing Prompts (num_proc=64): 70%|██████▉ | 251300/359152 [16:01<13:18, 135.05 examples/s]
Tokenizing Prompts (num_proc=64): 70%|███████ | 252300/359152 [16:08<13:07, 135.64 examples/s]
Tokenizing Prompts (num_proc=64): 71%|███████ | 253300/359152 [16:21<16:05, 109.63 examples/s]
Tokenizing Prompts (num_proc=64): 71%|███████ | 253912/359152 [16:37<22:41, 77.30 examples/s]
Tokenizing Prompts (num_proc=64): 71%|███████ | 254912/359152 [16:39<15:44, 110.34 examples/s]
Tokenizing Prompts (num_proc=64): 71%|███████▏ | 255912/359152 [16:46<14:46, 116.42 examples/s]
Tokenizing Prompts (num_proc=64): 72%|███████▏ | 256912/359152 [16:50<11:49, 144.11 examples/s]
Tokenizing Prompts (num_proc=64): 72%|███████▏ | 257912/359152 [16:56<11:14, 149.99 examples/s]
Tokenizing Prompts (num_proc=64): 72%|███████▏ | 258912/359152 [17:03<11:40, 143.15 examples/s]
Tokenizing Prompts (num_proc=64): 72%|███████▏ | 259524/359152 [17:05<10:05, 164.44 examples/s]
Tokenizing Prompts (num_proc=64): 73%|███████▎ | 260524/359152 [17:13<10:48, 152.09 examples/s]
Tokenizing Prompts (num_proc=64): 73%|███████▎ | 261524/359152 [17:22<11:56, 136.20 examples/s]
Tokenizing Prompts (num_proc=64): 73%|███████▎ | 262524/359152 [17:25<10:03, 160.13 examples/s]
Tokenizing Prompts (num_proc=64): 73%|███████▎ | 263524/359152 [17:32<10:11, 156.50 examples/s]
Tokenizing Prompts (num_proc=64): 74%|███████▎ | 264524/359152 [17:36<08:52, 177.69 examples/s]
Steps to reproduce
- you need a dataset of roughly 350k*20k tokens on average
- put it into an axolotl config
axolotl train...
Config yaml
base_model: mistralai/Mistral-Nemo-Base-2407
tokenizer_type: AutoTokenizer
strict: false
plugins:
- axolotl.integrations.liger.LigerPlugin
liger_rope: true
liger_rms_norm: true
liger_swiglu: true
liger_fused_linear_cross_entropy: true
datasets:
- path: your_dataset
type: chat_template
field_messages: conversation
message_property_mappings:
role: role
content: content
roles:
system:
- system
user:
- user
assistant:
- assistant
chat_template: chatml
dataset_processes: 64
dataset_prepared_path: last_run_prepared
val_set_size: 0
output_dir: ./outputs/out
sequence_len: 65536
sample_packing: true
sample_packing_sequentially: true
pad_to_sequence_len: true
curriculum_sampling: true
wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:
sequence_parallel_degree: 1
gradient_accumulation_steps: 8
micro_batch_size: 2
num_epochs: 1
optimizer: adamw_torch_8bit
lr_scheduler: cosine
learning_rate: 2e-5
train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false
gradient_checkpointing: offload_disk
gradient_checkpointing_kwargs:
use_reentrant: false
early_stopping_patience:
resume_from_checkpoint:
logging_steps: 1
xformers_attention:
flash_attention: true
warmup_steps: 100
evals_per_epoch: 2
eval_table_size:
saves_per_epoch: 1
debug:
deepspeed: zero3.json
weight_decay: 0.1
fsdp:
fsdp_config:
special_tokens:
eos_token: <|im_end|>
pad_token: <|im_end|>
Possible solution
No response
Which Operating Systems are you using?
- [x] Linux
- [ ] macOS
- [ ] Windows
Python Version
3.11
axolotl branch-commit
release 0.9.2
Acknowledgements
- [x] My issue title is concise, descriptive, and in title casing.
- [x] I have searched the existing issues to make sure this bug has not been reported yet.
- [x] I am using the latest version of axolotl.
- [x] I have provided enough information for the maintainers to reproduce and diagnose the issue.
Do you have an example of a public dataset that we can repro this on?
Do you have an example of a public dataset that we can repro this on?
Unfortunately I don't
Launching preprocessing in distributed mode is the main problem. You can probably create a dummy dataset of 1 million samples with 64k tokens each and try, but I cannot for the life of me not avoid the timeout when using with zero_first. Is it possible to remove this in the preprocessing and achieve the same effect in another way?
@retry_on_request_exceptions(max_retries=3, delay=5)
def prepare_dataset(cfg, tokenizer, processor=None, preprocess_iterable=None):
prompters = []
if not cfg.pretraining_dataset:
with zero_first(is_local_main_process()):
Maybe the axolotl preprocess CLI should not launch with accelerate? What do you think @winglian?
In the legacy docs, we had used python.
But digging into the cli, we don't use accelerate for the preprocess.
I would try with CUDA_VISIBLE_DEVICES="" axolotl preprocess config.yaml
oh wait, are you using axolotl preprocess before axolotl train?
I used axolotl train, triggered the error, then pivoted to axolotl preprocess and found the same error. I will need to check the commands again, but I'm pretty sure I can do the Python command instead.
This does the trick. Though, I would recommend using something else than with zero_first(is_local_main_process()) in general. This lowers QoL when using axolotl and could be replaced with a simpler FileLock system: https://github.com/casper-hansen/FlashSamplePack/commit/c86cd04cf9e842acf48736ba8286502ff504237a
python -m axolotl.cli.preprocess axolotl_config.yaml
@casper-hansen agreed, feel free to make a PR! Or, I'll probably do so later.
@casper-hansen agreed, feel free to make a PR! Or, I'll probably do so later.
I probably won't be creating the PR, but let's leave this issue open until a solution is in place.
FYI: the current plan is to roll this into my data loading refactor.