Trion 3.2.0 Doesn't Work with GRPO+vllm
Please check that this issue hasn't been reported before.
- [x] I searched previous Bug Reports didn't find any similar reports.
Expected Behavior
The GRPO training as detailed in the docs should just work (launch vllm srv command and launch train command).
Current behaviour
Instead, the axolotl commands execute, and I get a PY_SSIZE_T_CLEAN incorrectly set issue. I tried downgrading pytorch to 2.5.1 but that did nothing. Eventually what ended up working was installing axolotl in the following way (clean conda env):
pip3 install -U packaging==23.2 setuptools==75.8.0 wheel ninja
module load cuda/12.4
which nvcc
pip install wandb
pip3 install torch torchvision torchaudio
pip3 install --no-build-isolation --verbose axolotl[flash-attn,deepspeed,vllm]
pip install triton==3.1.0
axolotl
So basically I uninstall triton 3.2.0 and install 3.1.0. I still get a torch dynamo error but it doesnt kill the service and the grpo training continues.
This bug only happens with the grpo training, the example quickstsart with fetch+finetune doesn't throw any errors.
Steps to reproduce
install axolotl as instructed on the main readme with python 3.11
Config yaml
base_model: Qwen/Qwen2.5-1.5B-Instruct
# Automatically upload checkpoint and final model to HF
# hub_model_id: username/custom_model_name
load_in_8bit: false
load_in_4bit: false
strict: false
torch_compile: true
vllm:
host: 0.0.0.0
port: 8000
tensor_parallel_size: 2
gpu_memory_utilization: 0.85
dtype: auto
# max_model_len: # you may find it useful to set the vLLM model context length if you know this beforehand
rl: grpo
trl:
use_vllm: true
vllm_server_host: localhost
vllm_server_port: 8000
vllm_server_timeout: 300
beta: 0.001
max_completion_length: 512
use_vllm: true
reward_funcs:
- gsm8k_grpo.correctness_reward_func
- gsm8k_grpo.int_reward_func
- gsm8k_grpo.strict_format_reward_func
- gsm8k_grpo.soft_format_reward_func
- gsm8k_grpo.xmlcount_reward_func
vllm_gpu_memory_utilization: 0.9
vllm_max_model_len: 800
num_generations: 16
chat_template: qwen_25
datasets:
- path: skrishna/gsm8k_only_answer
type: gsm8k_grpo.axo_gsm8k_transform
dataset_prepared_path: /gscratch/clmbr/revr/LRMGraph/workspace/data/last_run_prepared
skip_prepare_dataset: true
val_set_size: 0.0
output_dir: /gscratch/clmbr/revr/LRMGraph/workspace/data/axolotl-artifacts/r1-outputs
dataloader_prefetch_factor: 32
dataloader_num_workers: 2
dataloader_pin_memory: true
gc_steps: 1
sequence_len: 800
sample_packing: false
eval_sample_packing: false
pad_to_sequence_len: false
wandb_project: gsm8k-grpo-proj
wandb_entity:
wandb_name: rev2021-university-of-washington
gradient_accumulation_steps: 8
micro_batch_size: 16 # should match num_generations / num_gpus
num_epochs: 1
optimizer: adamw_torch_fused
lr_scheduler: constant_with_warmup
learning_rate: 1.0e-6
max_grad_norm: 1.0
weight_decay: 0.1
bf16: true
tf32: true
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: false
flash_attention: true
logging_steps: 1
warmup_steps: 100
evals_per_epoch: 1
saves_per_epoch: 4
Possible solution
downgrade triton? I heard folks using unsloth have similar problems
Which Operating Systems are you using?
- [x] Linux
- [ ] macOS
- [ ] Windows
Python Version
3.11
axolotl branch-commit
latest published on pypi
Acknowledgements
- [x] My issue title is concise, descriptive, and in title casing.
- [x] I have searched the existing issues to make sure this bug has not been reported yet.
- [x] I am using the latest version of axolotl.
- [x] I have provided enough information for the maintainers to reproduce and diagnose the issue.
Part of the stack trace (this is present even with the triton downgrade, though the job still runs)
[out, 3:47, 238s, g3118] [2025-04-12 17:49:53,069] [INFO] [axolotl.callbacks.on_train_begin:811] [PID:26094] [RANK:0] The Axolotl config has been saved to the WandB run under files.
[err, 3:47, 238s, g3118] wandb: WARNING Saving files without folders. If you want to preserve subdirectories pass base_path to wandb.save, i.e. wandb.save("/mnt/folder/file.h5", base_path="/mnt")
[err, 3:47, 238s, g3118]
[err, 3:47, 238s, g3118] 0%| | 0/934 [00:00<?, ?it/s]
[err, 3:47, 238s, g3118] Processed prompts: 0%| | 0/16 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
[out, 3:52, 242s, g3118] INFO: 127.0.0.1:57280 - "POST /generate/ HTTP/1.1" 200 OK
[err, 3:52, 242s, g3118]
[err, 3:52, 242s, g3118] Processed prompts: 100%|██████████| 16/16 [00:03<00:00, 4.40it/s, est. speed input: 770.79 toks/s, output: 1201.60 toks/s]
[err, 3:52, 242s, g3118] Processed prompts: 100%|██████████| 16/16 [00:03<00:00, 4.40it/s, est. speed input: 770.79 toks/s, output: 1201.60 toks/s]
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] Graph break from `Tensor.item()`, consider setting:
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] torch._dynamo.config.capture_scalar_outputs = True
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] or:
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] env TORCHDYNAMO_CAPTURE_SCALAR_OUTPUTS=1
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] to include these operations in the captured graph.
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] Graph break: from user code at:
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/accelerate/utils/operations.py", line 814, in forward
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] return model_forward(*args, **kwargs)
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/accelerate/utils/operations.py", line 802, in __call__
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] return convert_to_fp32(self.model_forward(*args, **kwargs))
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] return func(*args, **kwargs)
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/transformers/utils/generic.py", line 965, in wrapper
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] output = func(self, *args, **kwargs)
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/transformers/utils/deprecation.py", line 172, in wrapped_func
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] return func(*args, **kwargs)
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 823, in forward
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] outputs: BaseModelOutputWithPast = self.model(
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/transformers/utils/generic.py", line 965, in wrapper
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] output = func(self, *args, **kwargs)
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 519, in forward
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] causal_mask = self._update_causal_mask(
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 596, in _update_causal_mask
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] if attention_mask is not None and 0.0 in attention_mask:
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]
Graph breaks are normal. Usually just means the modeling code isn't optimized for triton/compile.
@winglian that is good to know. What about the triton 3.2.0 issue that throws the PY_SSIZE_T_CLEAN error?
@winglian that is good to know. What about the triton 3.2.0 issue that throws the PY_SSIZE_T_CLEAN error?
Do you have the stack trace for that?
@NanoCode012 , here is the stack trace!
[err, 1:25, 111s, g3115] Processed prompts: 0%| | 0/16 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
[out, 1:28, 114s, g3115] INFO: 127.0.0.1:52414 - "POST /generate/ HTTP/1.1" 200 OK
[err, 1:28, 114s, g3115]
[err, 1:28, 114s, g3115] Processed prompts: 100%|██████████| 16/16 [00:03<00:00, 4.01it/s, est. speed input: 400.91 toks/s, output: 1423.23 toks/s]
[err, 1:28, 114s, g3115] Processed prompts: 100%|██████████| 16/16 [00:03<00:00, 4.01it/s, est. speed input: 400.91 toks/s, output: 1423.23 toks/s]
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] Graph break from Tensor.item(), consider setting:
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] torch._dynamo.config.capture_scalar_outputs = True
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] or:
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] env TORCHDYNAMO_CAPTURE_SCALAR_OUTPUTS=1
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] to include these operations in the captured graph.
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] Graph break: from user code at:
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/accelerate/utils/operations.py", line 814, in forward
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] return model_forward(*args, **kwargs)
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/accelerate/utils/operations.py", line 802, in __call__
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] return convert_to_fp32(self.model_forward(*args, **kwargs))
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] return func(*args, **kwargs)
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/transformers/utils/generic.py", line 965, in wrapper
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] output = func(self, *args, **kwargs)
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/transformers/utils/deprecation.py", line 172, in wrapped_func
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] return func(*args, **kwargs)
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 823, in forward
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] outputs: BaseModelOutputWithPast = self.model(
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/transformers/utils/generic.py", line 965, in wrapper
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] output = func(self, *args, **kwargs)
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 519, in forward
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] causal_mask = self._update_causal_mask(
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 596, in _update_causal_mask
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] if attention_mask is not None and 0.0 in attention_mask:
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]
[out, 2:16, 162s, g3115] wandb:
[out, 2:16, 162s, g3115] wandb: 🚀 View run /gscratch/clmbr/revr/LRMGraph/workspace/data/axolotl-artifacts/r1-outputs at: https://wandb.ai/rev2021-university-of-washington/gsm8k-grpo-proj/runs/su8l84sj
[out, 2:16, 162s, g3115] wandb: Find logs at: ../../../../../../mmfs1/gscratch/clmbr/revr/LRMGraph/scripts/axolotl/wandb/run-20250411_220735-su8l84sj/logs
[err, 2:16, 162s, g3115] Traceback (most recent call last):
[err, 2:16, 162s, g3115] File "<frozen runpy>", line 198, in _run_module_as_main
[err, 2:16, 162s, g3115] File "<frozen runpy>", line 88, in _run_code
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/axolotl/cli/train.py", line 117, in <module>
[err, 2:16, 162s, g3115] fire.Fire(do_cli)
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/fire/core.py", line 135, in Fire
[err, 2:16, 162s, g3115] component_trace = _Fire(component, args, parsed_flag_args, context, name)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/fire/core.py", line 468, in _Fire
[err, 2:16, 162s, g3115] component, remaining_args = _CallAndUpdateTrace(
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
[err, 2:16, 162s, g3115] component = fn(*varargs, **kwargs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/axolotl/cli/train.py", line 91, in do_cli
[err, 2:16, 162s, g3115] return do_train(parsed_cfg, parsed_cli_args)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/axolotl/cli/train.py", line 50, in do_train
[err, 2:16, 162s, g3115] model, tokenizer, trainer = train(cfg=cfg, dataset_meta=dataset_meta)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/axolotl/train.py", line 507, in train
[err, 2:16, 162s, g3115] execute_training(cfg, trainer, resume_from_checkpoint)
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/axolotl/train.py", line 193, in execute_training
[err, 2:16, 162s, g3115] trainer.train(resume_from_checkpoint=resume_from_checkpoint)
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/transformers/trainer.py", line 2245, in train
[err, 2:16, 162s, g3115] return inner_training_loop(
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/transformers/trainer.py", line 2560, in _inner_training_loop
[err, 2:16, 162s, g3115] tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/transformers/trainer.py", line 3730, in training_step
[err, 2:16, 162s, g3115] inputs = self._prepare_inputs(inputs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/trl/extras/profiling.py", line 87, in wrapper
[err, 2:16, 162s, g3115] return func(self, *args, **kwargs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/trl/trainer/grpo_trainer.py", line 647, in _prepare_inputs
[err, 2:16, 162s, g3115] inputs = self._generate_and_score_completions(inputs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/trl/trainer/grpo_trainer.py", line 753, in _generate_and_score_completions
[err, 2:16, 162s, g3115] ref_per_token_logps = self._get_per_token_logps(
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/trl/extras/profiling.py", line 87, in wrapper
[err, 2:16, 162s, g3115] return func(self, *args, **kwargs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/trl/trainer/grpo_trainer.py", line 589, in _get_per_token_logps
[err, 2:16, 162s, g3115] logits = model(input_ids=input_ids, attention_mask=attention_mask, logits_to_keep=logits_to_keep + 1).logits
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
[err, 2:16, 162s, g3115] return self._call_impl(*args, **kwargs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
[err, 2:16, 162s, g3115] return forward_call(*args, **kwargs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 574, in _fn
[err, 2:16, 162s, g3115] return fn(*args, **kwargs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
[err, 2:16, 162s, g3115] return self._call_impl(*args, **kwargs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
[err, 2:16, 162s, g3115] return forward_call(*args, **kwargs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/accelerate/utils/operations.py", line 814, in forward
[err, 2:16, 162s, g3115] return model_forward(*args, **kwargs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/accelerate/utils/operations.py", line 802, in __call__
[err, 2:16, 162s, g3115] return convert_to_fp32(self.model_forward(*args, **kwargs))
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
[err, 2:16, 162s, g3115] return func(*args, **kwargs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/transformers/utils/generic.py", line 965, in wrapper
[err, 2:16, 162s, g3115] output = func(self, *args, **kwargs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/transformers/utils/deprecation.py", line 172, in wrapped_func
[err, 2:16, 162s, g3115] return func(*args, **kwargs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 823, in forward
[err, 2:16, 162s, g3115] outputs: BaseModelOutputWithPast = self.model(
[err, 2:16, 162s, g3115] ^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
[err, 2:16, 162s, g3115] return self._call_impl(*args, **kwargs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
[err, 2:16, 162s, g3115] return forward_call(*args, **kwargs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/transformers/utils/generic.py", line 965, in wrapper
[err, 2:16, 162s, g3115] output = func(self, *args, **kwargs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1380, in __call__
[err, 2:16, 162s, g3115] return self._torchdynamo_orig_callable(
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1164, in __call__
[err, 2:16, 162s, g3115] result = self._inner_convert(
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 547, in __call__
[err, 2:16, 162s, g3115] return _compile(
[err, 2:16, 162s, g3115] ^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 986, in _compile
[err, 2:16, 162s, g3115] guarded_code = compile_inner(code, one_graph, hooks, transform)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 715, in compile_inner
[err, 2:16, 162s, g3115] return _compile_inner(code, one_graph, hooks, transform)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_utils_internal.py", line 95, in wrapper_function
[err, 2:16, 162s, g3115] return function(*args, **kwargs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 750, in _compile_inner
[err, 2:16, 162s, g3115] out_code = transform_code_object(code, transform)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/bytecode_transformation.py", line 1361, in transform_code_object
[err, 2:16, 162s, g3115] transformations(instructions, code_options)
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 231, in _fn
[err, 2:16, 162s, g3115] return fn(*args, **kwargs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 662, in transform
[err, 2:16, 162s, g3115] tracer.run()
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 2868, in run
[err, 2:16, 162s, g3115] super().run()
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 1052, in run
[err, 2:16, 162s, g3115] while self.step():
[err, 2:16, 162s, g3115] ^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 962, in step
[err, 2:16, 162s, g3115] self.dispatch_table[inst.opcode](self, inst)
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 657, in wrapper
[err, 2:16, 162s, g3115] return handle_graph_break(self, inst, speculation.reason)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 698, in handle_graph_break
[err, 2:16, 162s, g3115] self.output.compile_subgraph(self, reason=reason)
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 1136, in compile_subgraph
[err, 2:16, 162s, g3115] self.compile_and_call_fx_graph(
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 1382, in compile_and_call_fx_graph
[err, 2:16, 162s, g3115] compiled_fn = self.call_user_compiler(gm)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 1432, in call_user_compiler
[err, 2:16, 162s, g3115] return self._call_user_compiler(gm)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 1483, in _call_user_compiler
[err, 2:16, 162s, g3115] raise BackendCompilerFailed(self.compiler_fn, e).with_traceback(
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 1462, in _call_user_compiler
[err, 2:16, 162s, g3115] compiled_fn = compiler_fn(gm, self.example_inputs())
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 130, in __call__
[err, 2:16, 162s, g3115] compiled_gm = compiler_fn(gm, example_inputs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/__init__.py", line 2340, in __call__
[err, 2:16, 162s, g3115] return compile_fx(model_, inputs_, config_patches=self.config)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1863, in compile_fx
[err, 2:16, 162s, g3115] return aot_autograd(
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 83, in __call__
[err, 2:16, 162s, g3115] cg = aot_module_simplified(gm, example_inputs, **self.kwargs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1155, in aot_module_simplified
[err, 2:16, 162s, g3115] compiled_fn = dispatch_and_compile()
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1131, in dispatch_and_compile
[err, 2:16, 162s, g3115] compiled_fn, _ = create_aot_dispatcher_function(
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 580, in create_aot_dispatcher_function
[err, 2:16, 162s, g3115] return _create_aot_dispatcher_function(
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 830, in _create_aot_dispatcher_function
[err, 2:16, 162s, g3115] compiled_fn, fw_metadata = compiler_fn(
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 203, in aot_dispatch_base
[err, 2:16, 162s, g3115] compiled_fw = compiler(fw_module, updated_flat_args)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 489, in __call__
[err, 2:16, 162s, g3115] return self.compiler_fn(gm, example_inputs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1741, in fw_compiler_base
[err, 2:16, 162s, g3115] return inner_compile(
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 569, in compile_fx_inner
[err, 2:16, 162s, g3115] return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/repro/after_aot.py", line 102, in debug_wrapper
[err, 2:16, 162s, g3115] inner_compiled_fn = compiler_fn(gm, example_inputs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 685, in _compile_fx_inner
[err, 2:16, 162s, g3115] mb_compiled_graph = fx_codegen_and_compile(
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1129, in fx_codegen_and_compile
[err, 2:16, 162s, g3115] return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1044, in codegen_and_compile
[err, 2:16, 162s, g3115] compiled_fn = graph.compile_to_module().call
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/graph.py", line 2027, in compile_to_module
[err, 2:16, 162s, g3115] return self._compile_to_module()
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/graph.py", line 2068, in _compile_to_module
[err, 2:16, 162s, g3115] mod = PyCodeCache.load_by_key_path(
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/codecache.py", line 2759, in load_by_key_path
[err, 2:16, 162s, g3115] mod = _reload_python_module(key, path)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/runtime/compile_tasks.py", line 45, in _reload_python_module
[err, 2:16, 162s, g3115] exec(code, mod.__dict__, mod.__dict__)
[err, 2:16, 162s, g3115] File "/tmp/torchinductor_revr/fa/cfaf5f3d7s6g43rxlw4wh76xmjqkr37r3j6wa4w5tgidlfdxtsyk.py", line 48, in <module>
[err, 2:16, 162s, g3115] triton_poi_fused_embedding_0 = async_compile.triton('triton_poi_fused_embedding_0', '''
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/async_compile.py", line 213, in triton
[err, 2:16, 162s, g3115] kernel.precompile()
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 293, in precompile
[err, 2:16, 162s, g3115] compiled_binary, launcher = self._precompile_config(
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 520, in _precompile_config
[err, 2:16, 162s, g3115] binary._init_handles()
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/triton/compiler/compiler.py", line 390, in _init_handles
[err, 2:16, 162s, g3115] self.module, self.function, self.n_regs, self.n_spills = driver.active.utils.load_binary(
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
[err, 2:16, 162s, g3115] SystemError: PY_SSIZE_T_CLEAN macro must be defined for '#' formats
[err, 2:16, 162s, g3115]
[err, 2:16, 162s, g3115] Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
[err, 2:16, 162s, g3115]
[err, 2:16, 162s, g3115]
[err, 2:16, 162s, g3115] You can suppress this exception and fall back to eager by setting:
[err, 2:16, 162s, g3115] import torch._dynamo
[err, 2:16, 162s, g3115] torch._dynamo.config.suppress_errors = True
[err, 2:16, 162s, g3115]
[err, 2:16, 162s, g3115] Traceback (most recent call last):
[err, 2:16, 162s, g3115] File "<frozen runpy>", line 198, in _run_module_as_main
[err, 2:16, 162s, g3115] File "<frozen runpy>", line 88, in _run_code
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/axolotl/cli/train.py", line 117, in <module>
[err, 2:16, 162s, g3115] fire.Fire(do_cli)
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/fire/core.py", line 135, in Fire
[err, 2:16, 162s, g3115] component_trace = _Fire(component, args, parsed_flag_args, context, name)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/fire/core.py", line 468, in _Fire
[err, 2:16, 162s, g3115] component, remaining_args = _CallAndUpdateTrace(
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
[err, 2:16, 162s, g3115] component = fn(*varargs, **kwargs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/axolotl/cli/train.py", line 91, in do_cli
[err, 2:16, 162s, g3115] return do_train(parsed_cfg, parsed_cli_args)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/axolotl/cli/train.py", line 50, in do_train
[err, 2:16, 162s, g3115] model, tokenizer, trainer = train(cfg=cfg, dataset_meta=dataset_meta)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/axolotl/train.py", line 507, in train
[err, 2:16, 162s, g3115] execute_training(cfg, trainer, resume_from_checkpoint)
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/axolotl/train.py", line 193, in execute_training
[err, 2:16, 162s, g3115] trainer.train(resume_from_checkpoint=resume_from_checkpoint)
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/transformers/trainer.py", line 2245, in train
[err, 2:16, 162s, g3115] return inner_training_loop(
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/transformers/trainer.py", line 2560, in _inner_training_loop
[err, 2:16, 162s, g3115] tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/transformers/trainer.py", line 3730, in training_step
[err, 2:16, 162s, g3115] inputs = self._prepare_inputs(inputs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/trl/extras/profiling.py", line 87, in wrapper
[err, 2:16, 162s, g3115] return func(self, *args, **kwargs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/trl/trainer/grpo_trainer.py", line 647, in _prepare_inputs
[err, 2:16, 162s, g3115] inputs = self._generate_and_score_completions(inputs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/trl/trainer/grpo_trainer.py", line 753, in _generate_and_score_completions
[err, 2:16, 162s, g3115] ref_per_token_logps = self._get_per_token_logps(
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/trl/extras/profiling.py", line 87, in wrapper
[err, 2:16, 162s, g3115] return func(self, *args, **kwargs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/trl/trainer/grpo_trainer.py", line 589, in _get_per_token_logps
[err, 2:16, 162s, g3115] logits = model(input_ids=input_ids, attention_mask=attention_mask, logits_to_keep=logits_to_keep + 1).logits
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
[err, 2:16, 162s, g3115] return self._call_impl(*args, **kwargs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
[err, 2:16, 162s, g3115] return forward_call(*args, **kwargs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 574, in _fn
[err, 2:16, 162s, g3115] return fn(*args, **kwargs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
[err, 2:16, 162s, g3115] return self._call_impl(*args, **kwargs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
[err, 2:16, 162s, g3115] return forward_call(*args, **kwargs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/accelerate/utils/operations.py", line 814, in forward
[err, 2:16, 162s, g3115] return model_forward(*args, **kwargs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/accelerate/utils/operations.py", line 802, in __call__
[err, 2:16, 162s, g3115] return convert_to_fp32(self.model_forward(*args, **kwargs))
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
[err, 2:16, 162s, g3115] return func(*args, **kwargs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/transformers/utils/generic.py", line 965, in wrapper
[err, 2:16, 162s, g3115] output = func(self, *args, **kwargs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/transformers/utils/deprecation.py", line 172, in wrapped_func
[err, 2:16, 162s, g3115] return func(*args, **kwargs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 823, in forward
[err, 2:16, 162s, g3115] outputs: BaseModelOutputWithPast = self.model(
[err, 2:16, 162s, g3115] ^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
[err, 2:16, 162s, g3115] return self._call_impl(*args, **kwargs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
[err, 2:16, 162s, g3115] return forward_call(*args, **kwargs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/transformers/utils/generic.py", line 965, in wrapper
[err, 2:16, 162s, g3115] output = func(self, *args, **kwargs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1380, in __call__
[err, 2:16, 162s, g3115] return self._torchdynamo_orig_callable(
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1164, in __call__
[err, 2:16, 162s, g3115] result = self._inner_convert(
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 547, in __call__
[err, 2:16, 162s, g3115] return _compile(
[err, 2:16, 162s, g3115] ^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 986, in _compile
[err, 2:16, 162s, g3115] guarded_code = compile_inner(code, one_graph, hooks, transform)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 715, in compile_inner
[err, 2:16, 162s, g3115] return _compile_inner(code, one_graph, hooks, transform)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_utils_internal.py", line 95, in wrapper_function
[err, 2:16, 162s, g3115] return function(*args, **kwargs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 750, in _compile_inner
[err, 2:16, 162s, g3115] out_code = transform_code_object(code, transform)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/bytecode_transformation.py", line 1361, in transform_code_object
[err, 2:16, 162s, g3115] transformations(instructions, code_options)
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 231, in _fn
[err, 2:16, 162s, g3115] return fn(*args, **kwargs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 662, in transform
[err, 2:16, 162s, g3115] tracer.run()
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 2868, in run
[err, 2:16, 162s, g3115] super().run()
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 1052, in run
[err, 2:16, 162s, g3115] while self.step():
[err, 2:16, 162s, g3115] ^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 962, in step
[err, 2:16, 162s, g3115] self.dispatch_table[inst.opcode](self, inst)
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 657, in wrapper
[err, 2:16, 162s, g3115] return handle_graph_break(self, inst, speculation.reason)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 698, in handle_graph_break
[err, 2:16, 162s, g3115] self.output.compile_subgraph(self, reason=reason)
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 1136, in compile_subgraph
[err, 2:16, 162s, g3115] self.compile_and_call_fx_graph(
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 1382, in compile_and_call_fx_graph
[err, 2:16, 162s, g3115] compiled_fn = self.call_user_compiler(gm)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 1432, in call_user_compiler
[err, 2:16, 162s, g3115] return self._call_user_compiler(gm)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 1483, in _call_user_compiler
[err, 2:16, 162s, g3115] raise BackendCompilerFailed(self.compiler_fn, e).with_traceback(
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 1462, in _call_user_compiler
[err, 2:16, 162s, g3115] compiled_fn = compiler_fn(gm, self.example_inputs())
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 130, in __call__
[err, 2:16, 162s, g3115] compiled_gm = compiler_fn(gm, example_inputs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/__init__.py", line 2340, in __call__
[err, 2:16, 162s, g3115] return compile_fx(model_, inputs_, config_patches=self.config)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1863, in compile_fx
[err, 2:16, 162s, g3115] return aot_autograd(
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 83, in __call__
[err, 2:16, 162s, g3115] cg = aot_module_simplified(gm, example_inputs, **self.kwargs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1155, in aot_module_simplified
[err, 2:16, 162s, g3115] compiled_fn = dispatch_and_compile()
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1131, in dispatch_and_compile
[err, 2:16, 162s, g3115] compiled_fn, _ = create_aot_dispatcher_function(
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 580, in create_aot_dispatcher_function
[err, 2:16, 162s, g3115] return _create_aot_dispatcher_function(
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 830, in _create_aot_dispatcher_function
[err, 2:16, 162s, g3115] compiled_fn, fw_metadata = compiler_fn(
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 203, in aot_dispatch_base
[err, 2:16, 162s, g3115] compiled_fw = compiler(fw_module, updated_flat_args)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 489, in __call__
[err, 2:16, 162s, g3115] return self.compiler_fn(gm, example_inputs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1741, in fw_compiler_base
[err, 2:16, 162s, g3115] return inner_compile(
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 569, in compile_fx_inner
[err, 2:16, 162s, g3115] return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/repro/after_aot.py", line 102, in debug_wrapper
[err, 2:16, 162s, g3115] inner_compiled_fn = compiler_fn(gm, example_inputs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 685, in _compile_fx_inner
[err, 2:16, 162s, g3115] mb_compiled_graph = fx_codegen_and_compile(
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1129, in fx_codegen_and_compile
[err, 2:16, 162s, g3115] return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1044, in codegen_and_compile
[err, 2:16, 162s, g3115] compiled_fn = graph.compile_to_module().call
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/graph.py", line 2027, in compile_to_module
[err, 2:16, 162s, g3115] return self._compile_to_module()
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/graph.py", line 2068, in _compile_to_module
[err, 2:16, 162s, g3115] mod = PyCodeCache.load_by_key_path(
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/codecache.py", line 2759, in load_by_key_path
[err, 2:16, 162s, g3115] mod = _reload_python_module(key, path)
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/runtime/compile_tasks.py", line 45, in _reload_python_module
[err, 2:16, 162s, g3115] exec(code, mod.__dict__, mod.__dict__)
[err, 2:16, 162s, g3115] File "/tmp/torchinductor_revr/fa/cfaf5f3d7s6g43rxlw4wh76xmjqkr37r3j6wa4w5tgidlfdxtsyk.py", line 48, in <module>
[err, 2:16, 162s, g3115] triton_poi_fused_embedding_0 = async_compile.triton('triton_poi_fused_embedding_0', '''
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/async_compile.py", line 213, in triton
[err, 2:16, 162s, g3115] kernel.precompile()
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 293, in precompile
[err, 2:16, 162s, g3115] compiled_binary, launcher = self._precompile_config(
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 520, in _precompile_config
[err, 2:16, 162s, g3115] binary._init_handles()
[err, 2:16, 162s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/triton/compiler/compiler.py", line 390, in _init_handles
[err, 2:16, 162s, g3115] self.module, self.function, self.n_regs, self.n_spills = driver.active.utils.load_binary(
[err, 2:16, 162s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
[err, 2:16, 162s, g3115] SystemError: PY_SSIZE_T_CLEAN macro must be defined for '#' formats
[err, 2:16, 162s, g3115]
[err, 2:16, 162s, g3115] Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
[err, 2:16, 162s, g3115]
[err, 2:16, 162s, g3115]
[err, 2:16, 162s, g3115] You can suppress this exception and fall back to eager by setting:
[err, 2:16, 162s, g3115] import torch._dynamo
[err, 2:16, 162s, g3115] torch._dynamo.config.suppress_errors = True
[err, 2:16, 162s, g3115]
[out, 2:17, 164s, g3115] INFO: 127.0.0.1:36960 - "POST /close_communicator/ HTTP/1.1" 200 OK
[out, 2:22, 168s, g3115] [2025-04-11 22:08:36,407] [ERROR] [root.train:178] [PID:17959] Failed to train/fine-tune config 'gsm8k.yaml': Command '['accelerate', 'launch', '--num_processes', '1', '-m', 'axolotl.cli.train', 'gsm8k.yaml', '--debug-num-examples', '0']' returned non-zero exit status 1.
[err, 2:22, 168s, g3115] Traceback (most recent call last):
[err, 2:22, 168s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/bin/accelerate", line 8, in <module>
[err, 2:22, 168s, g3115] sys.exit(main())
[err, 2:22, 168s, g3115] ^^^^^^
[err, 2:22, 168s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/accelerate/commands/accelerate_cli.py", line 50, in main
[err, 2:22, 168s, g3115] args.func(args)
[err, 2:22, 168s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/accelerate/commands/launch.py", line 1213, in launch_command
[err, 2:22, 168s, g3115] simple_launcher(args)
[err, 2:22, 168s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/accelerate/commands/launch.py", line 795, in simple_launcher
[err, 2:22, 168s, g3115] raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
[err, 2:22, 168s, g3115] subprocess.CalledProcessError: Command '['/gscratch/clmbr/revr/LRMGraph/axolotl_env/bin/python', '-m', 'axolotl.cli.train', 'gsm8k.yaml', '--debug-num-examples', '0']' returned non-zero exit status 1.
[err, 2:22, 168s, g3115] Traceback (most recent call last):
[err, 2:22, 168s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/bin/axolotl", line 8, in <module>
[err, 2:22, 168s, g3115] sys.exit(main())
[err, 2:22, 168s, g3115] ^^^^^^
[err, 2:22, 168s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/axolotl/cli/main.py", line 337, in main
[err, 2:22, 168s, g3115] cli()
[err, 2:22, 168s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/click/core.py", line 1161, in __call__
[err, 2:22, 168s, g3115] return self.main(*args, **kwargs)
[err, 2:22, 168s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:22, 168s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/click/core.py", line 1082, in main
[err, 2:22, 168s, g3115] rv = self.invoke(ctx)
[err, 2:22, 168s, g3115] ^^^^^^^^^^^^^^^^
[err, 2:22, 168s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/click/core.py", line 1697, in invoke
[err, 2:22, 168s, g3115] return _process_result(sub_ctx.command.invoke(sub_ctx))
[err, 2:22, 168s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:22, 168s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/click/core.py", line 1443, in invoke
[err, 2:22, 168s, g3115] return ctx.invoke(self.callback, **ctx.params)
[err, 2:22, 168s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:22, 168s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/click/core.py", line 788, in invoke
[err, 2:22, 168s, g3115] return __callback(*args, **kwargs)
[err, 2:22, 168s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:22, 168s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/axolotl/cli/utils.py", line 66, in wrapper
[err, 2:22, 168s, g3115] return func(*args, **filtered_kwargs)
[err, 2:22, 168s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:22, 168s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/axolotl/cli/main.py", line 180, in train
[err, 2:22, 168s, g3115] raise exc
[err, 2:22, 168s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/axolotl/cli/main.py", line 165, in train
[err, 2:22, 168s, g3115] subprocess.run(cmd, check=True) # nosec B603
[err, 2:22, 168s, g3115] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:22, 168s, g3115] File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/subprocess.py", line 573, in run
[err, 2:22, 168s, g3115] raise CalledProcessError(retcode, process.args,
[err, 2:22, 168s, g3115] subprocess.CalledProcessError: Command '['accelerate', 'launch', '--num_processes', '1', '-m', 'axolotl.cli.train', 'gsm8k.yaml', '--debug-num-examples', '0']' returned non-zero exit status 1.
@RevanthRameshkumar could you try run with torch_compile: False?