axolotl icon indicating copy to clipboard operation
axolotl copied to clipboard

Trion 3.2.0 Doesn't Work with GRPO+vllm

Open RevanthRameshkumar opened this issue 8 months ago • 6 comments

Please check that this issue hasn't been reported before.

  • [x] I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

The GRPO training as detailed in the docs should just work (launch vllm srv command and launch train command).

Current behaviour

Instead, the axolotl commands execute, and I get a PY_SSIZE_T_CLEAN incorrectly set issue. I tried downgrading pytorch to 2.5.1 but that did nothing. Eventually what ended up working was installing axolotl in the following way (clean conda env):

pip3 install -U packaging==23.2 setuptools==75.8.0 wheel ninja

module load cuda/12.4
which nvcc


pip install wandb

pip3 install torch torchvision torchaudio
pip3 install --no-build-isolation --verbose axolotl[flash-attn,deepspeed,vllm]
pip install triton==3.1.0

axolotl

So basically I uninstall triton 3.2.0 and install 3.1.0. I still get a torch dynamo error but it doesnt kill the service and the grpo training continues.

This bug only happens with the grpo training, the example quickstsart with fetch+finetune doesn't throw any errors.

Steps to reproduce

install axolotl as instructed on the main readme with python 3.11

Config yaml

base_model: Qwen/Qwen2.5-1.5B-Instruct
# Automatically upload checkpoint and final model to HF
# hub_model_id: username/custom_model_name

load_in_8bit: false
load_in_4bit: false
strict: false

torch_compile: true

vllm:
    host: 0.0.0.0
    port: 8000
    tensor_parallel_size: 2
    gpu_memory_utilization: 0.85
    dtype: auto
    # max_model_len: # you may find it useful to set the vLLM model context length if you know this beforehand

rl: grpo
trl:
  use_vllm: true
  vllm_server_host: localhost
  vllm_server_port: 8000
  vllm_server_timeout: 300
  beta: 0.001
  max_completion_length: 512
  use_vllm: true
  reward_funcs:
    - gsm8k_grpo.correctness_reward_func
    - gsm8k_grpo.int_reward_func
    - gsm8k_grpo.strict_format_reward_func
    - gsm8k_grpo.soft_format_reward_func
    - gsm8k_grpo.xmlcount_reward_func
  vllm_gpu_memory_utilization: 0.9
  vllm_max_model_len: 800
  num_generations: 16

chat_template: qwen_25
datasets:
  - path: skrishna/gsm8k_only_answer
    type: gsm8k_grpo.axo_gsm8k_transform
dataset_prepared_path: /gscratch/clmbr/revr/LRMGraph/workspace/data/last_run_prepared
skip_prepare_dataset: true
val_set_size: 0.0
output_dir: /gscratch/clmbr/revr/LRMGraph/workspace/data/axolotl-artifacts/r1-outputs

dataloader_prefetch_factor: 32
dataloader_num_workers: 2
dataloader_pin_memory: true

gc_steps: 1

sequence_len: 800
sample_packing: false
eval_sample_packing: false
pad_to_sequence_len: false

wandb_project: gsm8k-grpo-proj
wandb_entity:
wandb_name: rev2021-university-of-washington

gradient_accumulation_steps: 8
micro_batch_size: 16  # should match num_generations / num_gpus
num_epochs: 1

optimizer: adamw_torch_fused
lr_scheduler: constant_with_warmup
learning_rate: 1.0e-6
max_grad_norm: 1.0
weight_decay: 0.1

bf16: true
tf32: true

gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
flash_attention: true

logging_steps: 1
warmup_steps: 100
evals_per_epoch: 1
saves_per_epoch: 4

Possible solution

downgrade triton? I heard folks using unsloth have similar problems

Which Operating Systems are you using?

  • [x] Linux
  • [ ] macOS
  • [ ] Windows

Python Version

3.11

axolotl branch-commit

latest published on pypi

Acknowledgements

  • [x] My issue title is concise, descriptive, and in title casing.
  • [x] I have searched the existing issues to make sure this bug has not been reported yet.
  • [x] I am using the latest version of axolotl.
  • [x] I have provided enough information for the maintainers to reproduce and diagnose the issue.

RevanthRameshkumar avatar Apr 12 '25 20:04 RevanthRameshkumar

Part of the stack trace (this is present even with the triton downgrade, though the job still runs)

[out, 3:47, 238s, g3118] [2025-04-12 17:49:53,069] [INFO] [axolotl.callbacks.on_train_begin:811] [PID:26094] [RANK:0] The Axolotl config has been saved to the WandB run under files.
[err, 3:47, 238s, g3118] wandb: WARNING Saving files without folders. If you want to preserve subdirectories pass base_path to wandb.save, i.e. wandb.save("/mnt/folder/file.h5", base_path="/mnt")
[err, 3:47, 238s, g3118]
[err, 3:47, 238s, g3118]   0%|          | 0/934 [00:00<?, ?it/s]
[err, 3:47, 238s, g3118] Processed prompts:   0%|          | 0/16 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
[out, 3:52, 242s, g3118] INFO:     127.0.0.1:57280 - "POST /generate/ HTTP/1.1" 200 OK
[err, 3:52, 242s, g3118]
[err, 3:52, 242s, g3118] Processed prompts: 100%|██████████| 16/16 [00:03<00:00,  4.40it/s, est. speed input: 770.79 toks/s, output: 1201.60 toks/s]
[err, 3:52, 242s, g3118] Processed prompts: 100%|██████████| 16/16 [00:03<00:00,  4.40it/s, est. speed input: 770.79 toks/s, output: 1201.60 toks/s]
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] Graph break from `Tensor.item()`, consider setting:
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]     torch._dynamo.config.capture_scalar_outputs = True
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] or:
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]     env TORCHDYNAMO_CAPTURE_SCALAR_OUTPUTS=1
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] to include these operations in the captured graph.
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] Graph break: from user code at:
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/accelerate/utils/operations.py", line 814, in forward
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]     return model_forward(*args, **kwargs)
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/accelerate/utils/operations.py", line 802, in __call__
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]     return convert_to_fp32(self.model_forward(*args, **kwargs))
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]     return func(*args, **kwargs)
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/transformers/utils/generic.py", line 965, in wrapper
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]     output = func(self, *args, **kwargs)
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/transformers/utils/deprecation.py", line 172, in wrapped_func
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]     return func(*args, **kwargs)
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 823, in forward
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]     outputs: BaseModelOutputWithPast = self.model(
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/transformers/utils/generic.py", line 965, in wrapper
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]     output = func(self, *args, **kwargs)
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 519, in forward
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]     causal_mask = self._update_causal_mask(
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 596, in _update_causal_mask
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]     if attention_mask is not None and 0.0 in attention_mask:
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]
[err, 3:52, 242s, g3118] W0412 17:49:58.251000 26094 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.11/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]

RevanthRameshkumar avatar Apr 13 '25 00:04 RevanthRameshkumar

Graph breaks are normal. Usually just means the modeling code isn't optimized for triton/compile.

winglian avatar Apr 13 '25 00:04 winglian

@winglian that is good to know. What about the triton 3.2.0 issue that throws the PY_SSIZE_T_CLEAN error?

RevanthRameshkumar avatar Apr 13 '25 04:04 RevanthRameshkumar

@winglian that is good to know. What about the triton 3.2.0 issue that throws the PY_SSIZE_T_CLEAN error?

Do you have the stack trace for that?

NanoCode012 avatar Apr 16 '25 07:04 NanoCode012

@NanoCode012 , here is the stack trace!

[err, 1:25, 111s, g3115] Processed prompts:   0%|          | 0/16 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
[out, 1:28, 114s, g3115] INFO:     127.0.0.1:52414 - "POST /generate/ HTTP/1.1" 200 OK
[err, 1:28, 114s, g3115]
[err, 1:28, 114s, g3115] Processed prompts: 100%|██████████| 16/16 [00:03<00:00,  4.01it/s, est. speed input: 400.91 toks/s, output: 1423.23 toks/s]
[err, 1:28, 114s, g3115] Processed prompts: 100%|██████████| 16/16 [00:03<00:00,  4.01it/s, est. speed input: 400.91 toks/s, output: 1423.23 toks/s]
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] Graph break from Tensor.item(), consider setting:
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]     torch._dynamo.config.capture_scalar_outputs = True
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] or:
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]     env TORCHDYNAMO_CAPTURE_SCALAR_OUTPUTS=1
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] to include these operations in the captured graph.
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0] Graph break: from user code at:
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/accelerate/utils/operations.py", line 814, in forward
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]     return model_forward(*args, **kwargs)
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/accelerate/utils/operations.py", line 802, in __call__
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]     return convert_to_fp32(self.model_forward(*args, **kwargs))
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]     return func(*args, **kwargs)
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/transformers/utils/generic.py", line 965, in wrapper
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]     output = func(self, *args, **kwargs)
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/transformers/utils/deprecation.py", line 172, in wrapped_func
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]     return func(*args, **kwargs)
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 823, in forward
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]     outputs: BaseModelOutputWithPast = self.model(
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/transformers/utils/generic.py", line 965, in wrapper
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]     output = func(self, *args, **kwargs)
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 519, in forward
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]     causal_mask = self._update_causal_mask(
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 596, in _update_causal_mask
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]     if attention_mask is not None and 0.0 in attention_mask:
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]
[err, 1:31, 117s, g3115] W0411 22:07:44.901000 30330 /mmfs1/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py:869] [0/0]
[out, 2:16, 162s, g3115] wandb:
[out, 2:16, 162s, g3115] wandb: 🚀 View run /gscratch/clmbr/revr/LRMGraph/workspace/data/axolotl-artifacts/r1-outputs at: https://wandb.ai/rev2021-university-of-washington/gsm8k-grpo-proj/runs/su8l84sj
[out, 2:16, 162s, g3115] wandb: Find logs at: ../../../../../../mmfs1/gscratch/clmbr/revr/LRMGraph/scripts/axolotl/wandb/run-20250411_220735-su8l84sj/logs
[err, 2:16, 162s, g3115] Traceback (most recent call last):
[err, 2:16, 162s, g3115]   File "<frozen runpy>", line 198, in _run_module_as_main
[err, 2:16, 162s, g3115]   File "<frozen runpy>", line 88, in _run_code
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/axolotl/cli/train.py", line 117, in <module>
[err, 2:16, 162s, g3115]     fire.Fire(do_cli)
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/fire/core.py", line 135, in Fire
[err, 2:16, 162s, g3115]     component_trace = _Fire(component, args, parsed_flag_args, context, name)
[err, 2:16, 162s, g3115]                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/fire/core.py", line 468, in _Fire
[err, 2:16, 162s, g3115]     component, remaining_args = _CallAndUpdateTrace(
[err, 2:16, 162s, g3115]                                 ^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
[err, 2:16, 162s, g3115]     component = fn(*varargs, **kwargs)
[err, 2:16, 162s, g3115]                 ^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/axolotl/cli/train.py", line 91, in do_cli
[err, 2:16, 162s, g3115]     return do_train(parsed_cfg, parsed_cli_args)
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/axolotl/cli/train.py", line 50, in do_train
[err, 2:16, 162s, g3115]     model, tokenizer, trainer = train(cfg=cfg, dataset_meta=dataset_meta)
[err, 2:16, 162s, g3115]                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/axolotl/train.py", line 507, in train
[err, 2:16, 162s, g3115]     execute_training(cfg, trainer, resume_from_checkpoint)
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/axolotl/train.py", line 193, in execute_training
[err, 2:16, 162s, g3115]     trainer.train(resume_from_checkpoint=resume_from_checkpoint)
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/transformers/trainer.py", line 2245, in train
[err, 2:16, 162s, g3115]     return inner_training_loop(
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/transformers/trainer.py", line 2560, in _inner_training_loop
[err, 2:16, 162s, g3115]     tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
[err, 2:16, 162s, g3115]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/transformers/trainer.py", line 3730, in training_step
[err, 2:16, 162s, g3115]     inputs = self._prepare_inputs(inputs)
[err, 2:16, 162s, g3115]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/trl/extras/profiling.py", line 87, in wrapper
[err, 2:16, 162s, g3115]     return func(self, *args, **kwargs)
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/trl/trainer/grpo_trainer.py", line 647, in _prepare_inputs
[err, 2:16, 162s, g3115]     inputs = self._generate_and_score_completions(inputs)
[err, 2:16, 162s, g3115]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/trl/trainer/grpo_trainer.py", line 753, in _generate_and_score_completions
[err, 2:16, 162s, g3115]     ref_per_token_logps = self._get_per_token_logps(
[err, 2:16, 162s, g3115]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/trl/extras/profiling.py", line 87, in wrapper
[err, 2:16, 162s, g3115]     return func(self, *args, **kwargs)
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/trl/trainer/grpo_trainer.py", line 589, in _get_per_token_logps
[err, 2:16, 162s, g3115]     logits = model(input_ids=input_ids, attention_mask=attention_mask, logits_to_keep=logits_to_keep + 1).logits
[err, 2:16, 162s, g3115]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
[err, 2:16, 162s, g3115]     return self._call_impl(*args, **kwargs)
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
[err, 2:16, 162s, g3115]     return forward_call(*args, **kwargs)
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 574, in _fn
[err, 2:16, 162s, g3115]     return fn(*args, **kwargs)
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
[err, 2:16, 162s, g3115]     return self._call_impl(*args, **kwargs)
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
[err, 2:16, 162s, g3115]     return forward_call(*args, **kwargs)
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/accelerate/utils/operations.py", line 814, in forward
[err, 2:16, 162s, g3115]     return model_forward(*args, **kwargs)
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/accelerate/utils/operations.py", line 802, in __call__
[err, 2:16, 162s, g3115]     return convert_to_fp32(self.model_forward(*args, **kwargs))
[err, 2:16, 162s, g3115]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
[err, 2:16, 162s, g3115]     return func(*args, **kwargs)
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/transformers/utils/generic.py", line 965, in wrapper
[err, 2:16, 162s, g3115]     output = func(self, *args, **kwargs)
[err, 2:16, 162s, g3115]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/transformers/utils/deprecation.py", line 172, in wrapped_func
[err, 2:16, 162s, g3115]     return func(*args, **kwargs)
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 823, in forward
[err, 2:16, 162s, g3115]     outputs: BaseModelOutputWithPast = self.model(
[err, 2:16, 162s, g3115]                                        ^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
[err, 2:16, 162s, g3115]     return self._call_impl(*args, **kwargs)
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
[err, 2:16, 162s, g3115]     return forward_call(*args, **kwargs)
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/transformers/utils/generic.py", line 965, in wrapper
[err, 2:16, 162s, g3115]     output = func(self, *args, **kwargs)
[err, 2:16, 162s, g3115]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1380, in __call__
[err, 2:16, 162s, g3115]     return self._torchdynamo_orig_callable(
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1164, in __call__
[err, 2:16, 162s, g3115]     result = self._inner_convert(
[err, 2:16, 162s, g3115]              ^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 547, in __call__
[err, 2:16, 162s, g3115]     return _compile(
[err, 2:16, 162s, g3115]            ^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 986, in _compile
[err, 2:16, 162s, g3115]     guarded_code = compile_inner(code, one_graph, hooks, transform)
[err, 2:16, 162s, g3115]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 715, in compile_inner
[err, 2:16, 162s, g3115]     return _compile_inner(code, one_graph, hooks, transform)
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_utils_internal.py", line 95, in wrapper_function
[err, 2:16, 162s, g3115]     return function(*args, **kwargs)
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 750, in _compile_inner
[err, 2:16, 162s, g3115]     out_code = transform_code_object(code, transform)
[err, 2:16, 162s, g3115]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/bytecode_transformation.py", line 1361, in transform_code_object
[err, 2:16, 162s, g3115]     transformations(instructions, code_options)
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 231, in _fn
[err, 2:16, 162s, g3115]     return fn(*args, **kwargs)
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 662, in transform
[err, 2:16, 162s, g3115]     tracer.run()
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 2868, in run
[err, 2:16, 162s, g3115]     super().run()
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 1052, in run
[err, 2:16, 162s, g3115]     while self.step():
[err, 2:16, 162s, g3115]           ^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 962, in step
[err, 2:16, 162s, g3115]     self.dispatch_table[inst.opcode](self, inst)
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 657, in wrapper
[err, 2:16, 162s, g3115]     return handle_graph_break(self, inst, speculation.reason)
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 698, in handle_graph_break
[err, 2:16, 162s, g3115]     self.output.compile_subgraph(self, reason=reason)
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 1136, in compile_subgraph
[err, 2:16, 162s, g3115]     self.compile_and_call_fx_graph(
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 1382, in compile_and_call_fx_graph
[err, 2:16, 162s, g3115]     compiled_fn = self.call_user_compiler(gm)
[err, 2:16, 162s, g3115]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 1432, in call_user_compiler
[err, 2:16, 162s, g3115]     return self._call_user_compiler(gm)
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 1483, in _call_user_compiler
[err, 2:16, 162s, g3115]     raise BackendCompilerFailed(self.compiler_fn, e).with_traceback(
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 1462, in _call_user_compiler
[err, 2:16, 162s, g3115]     compiled_fn = compiler_fn(gm, self.example_inputs())
[err, 2:16, 162s, g3115]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 130, in __call__
[err, 2:16, 162s, g3115]     compiled_gm = compiler_fn(gm, example_inputs)
[err, 2:16, 162s, g3115]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/__init__.py", line 2340, in __call__
[err, 2:16, 162s, g3115]     return compile_fx(model_, inputs_, config_patches=self.config)
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1863, in compile_fx
[err, 2:16, 162s, g3115]     return aot_autograd(
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 83, in __call__
[err, 2:16, 162s, g3115]     cg = aot_module_simplified(gm, example_inputs, **self.kwargs)
[err, 2:16, 162s, g3115]          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1155, in aot_module_simplified
[err, 2:16, 162s, g3115]     compiled_fn = dispatch_and_compile()
[err, 2:16, 162s, g3115]                   ^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1131, in dispatch_and_compile
[err, 2:16, 162s, g3115]     compiled_fn, _ = create_aot_dispatcher_function(
[err, 2:16, 162s, g3115]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 580, in create_aot_dispatcher_function
[err, 2:16, 162s, g3115]     return _create_aot_dispatcher_function(
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 830, in _create_aot_dispatcher_function
[err, 2:16, 162s, g3115]     compiled_fn, fw_metadata = compiler_fn(
[err, 2:16, 162s, g3115]                                ^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 203, in aot_dispatch_base
[err, 2:16, 162s, g3115]     compiled_fw = compiler(fw_module, updated_flat_args)
[err, 2:16, 162s, g3115]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 489, in __call__
[err, 2:16, 162s, g3115]     return self.compiler_fn(gm, example_inputs)
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1741, in fw_compiler_base
[err, 2:16, 162s, g3115]     return inner_compile(
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 569, in compile_fx_inner
[err, 2:16, 162s, g3115]     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/repro/after_aot.py", line 102, in debug_wrapper
[err, 2:16, 162s, g3115]     inner_compiled_fn = compiler_fn(gm, example_inputs)
[err, 2:16, 162s, g3115]                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 685, in _compile_fx_inner
[err, 2:16, 162s, g3115]     mb_compiled_graph = fx_codegen_and_compile(
[err, 2:16, 162s, g3115]                         ^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1129, in fx_codegen_and_compile
[err, 2:16, 162s, g3115]     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1044, in codegen_and_compile
[err, 2:16, 162s, g3115]     compiled_fn = graph.compile_to_module().call
[err, 2:16, 162s, g3115]                   ^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/graph.py", line 2027, in compile_to_module
[err, 2:16, 162s, g3115]     return self._compile_to_module()
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/graph.py", line 2068, in _compile_to_module
[err, 2:16, 162s, g3115]     mod = PyCodeCache.load_by_key_path(
[err, 2:16, 162s, g3115]           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/codecache.py", line 2759, in load_by_key_path
[err, 2:16, 162s, g3115]     mod = _reload_python_module(key, path)
[err, 2:16, 162s, g3115]           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/runtime/compile_tasks.py", line 45, in _reload_python_module
[err, 2:16, 162s, g3115]     exec(code, mod.__dict__, mod.__dict__)
[err, 2:16, 162s, g3115]   File "/tmp/torchinductor_revr/fa/cfaf5f3d7s6g43rxlw4wh76xmjqkr37r3j6wa4w5tgidlfdxtsyk.py", line 48, in <module>
[err, 2:16, 162s, g3115]     triton_poi_fused_embedding_0 = async_compile.triton('triton_poi_fused_embedding_0', '''
[err, 2:16, 162s, g3115]                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/async_compile.py", line 213, in triton
[err, 2:16, 162s, g3115]     kernel.precompile()
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 293, in precompile
[err, 2:16, 162s, g3115]     compiled_binary, launcher = self._precompile_config(
[err, 2:16, 162s, g3115]                                 ^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 520, in _precompile_config
[err, 2:16, 162s, g3115]     binary._init_handles()
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/triton/compiler/compiler.py", line 390, in _init_handles
[err, 2:16, 162s, g3115]     self.module, self.function, self.n_regs, self.n_spills = driver.active.utils.load_binary(
[err, 2:16, 162s, g3115]                                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
[err, 2:16, 162s, g3115] SystemError: PY_SSIZE_T_CLEAN macro must be defined for '#' formats
[err, 2:16, 162s, g3115]
[err, 2:16, 162s, g3115] Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
[err, 2:16, 162s, g3115]
[err, 2:16, 162s, g3115]
[err, 2:16, 162s, g3115] You can suppress this exception and fall back to eager by setting:
[err, 2:16, 162s, g3115]     import torch._dynamo
[err, 2:16, 162s, g3115]     torch._dynamo.config.suppress_errors = True
[err, 2:16, 162s, g3115]
[err, 2:16, 162s, g3115] Traceback (most recent call last):
[err, 2:16, 162s, g3115]   File "<frozen runpy>", line 198, in _run_module_as_main
[err, 2:16, 162s, g3115]   File "<frozen runpy>", line 88, in _run_code
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/axolotl/cli/train.py", line 117, in <module>
[err, 2:16, 162s, g3115]     fire.Fire(do_cli)
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/fire/core.py", line 135, in Fire
[err, 2:16, 162s, g3115]     component_trace = _Fire(component, args, parsed_flag_args, context, name)
[err, 2:16, 162s, g3115]                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/fire/core.py", line 468, in _Fire
[err, 2:16, 162s, g3115]     component, remaining_args = _CallAndUpdateTrace(
[err, 2:16, 162s, g3115]                                 ^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
[err, 2:16, 162s, g3115]     component = fn(*varargs, **kwargs)
[err, 2:16, 162s, g3115]                 ^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/axolotl/cli/train.py", line 91, in do_cli
[err, 2:16, 162s, g3115]     return do_train(parsed_cfg, parsed_cli_args)
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/axolotl/cli/train.py", line 50, in do_train
[err, 2:16, 162s, g3115]     model, tokenizer, trainer = train(cfg=cfg, dataset_meta=dataset_meta)
[err, 2:16, 162s, g3115]                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/axolotl/train.py", line 507, in train
[err, 2:16, 162s, g3115]     execute_training(cfg, trainer, resume_from_checkpoint)
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/axolotl/train.py", line 193, in execute_training
[err, 2:16, 162s, g3115]     trainer.train(resume_from_checkpoint=resume_from_checkpoint)
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/transformers/trainer.py", line 2245, in train
[err, 2:16, 162s, g3115]     return inner_training_loop(
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/transformers/trainer.py", line 2560, in _inner_training_loop
[err, 2:16, 162s, g3115]     tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
[err, 2:16, 162s, g3115]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/transformers/trainer.py", line 3730, in training_step
[err, 2:16, 162s, g3115]     inputs = self._prepare_inputs(inputs)
[err, 2:16, 162s, g3115]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/trl/extras/profiling.py", line 87, in wrapper
[err, 2:16, 162s, g3115]     return func(self, *args, **kwargs)
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/trl/trainer/grpo_trainer.py", line 647, in _prepare_inputs
[err, 2:16, 162s, g3115]     inputs = self._generate_and_score_completions(inputs)
[err, 2:16, 162s, g3115]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/trl/trainer/grpo_trainer.py", line 753, in _generate_and_score_completions
[err, 2:16, 162s, g3115]     ref_per_token_logps = self._get_per_token_logps(
[err, 2:16, 162s, g3115]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/trl/extras/profiling.py", line 87, in wrapper
[err, 2:16, 162s, g3115]     return func(self, *args, **kwargs)
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/trl/trainer/grpo_trainer.py", line 589, in _get_per_token_logps
[err, 2:16, 162s, g3115]     logits = model(input_ids=input_ids, attention_mask=attention_mask, logits_to_keep=logits_to_keep + 1).logits
[err, 2:16, 162s, g3115]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
[err, 2:16, 162s, g3115]     return self._call_impl(*args, **kwargs)
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
[err, 2:16, 162s, g3115]     return forward_call(*args, **kwargs)
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 574, in _fn
[err, 2:16, 162s, g3115]     return fn(*args, **kwargs)
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
[err, 2:16, 162s, g3115]     return self._call_impl(*args, **kwargs)
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
[err, 2:16, 162s, g3115]     return forward_call(*args, **kwargs)
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/accelerate/utils/operations.py", line 814, in forward
[err, 2:16, 162s, g3115]     return model_forward(*args, **kwargs)
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/accelerate/utils/operations.py", line 802, in __call__
[err, 2:16, 162s, g3115]     return convert_to_fp32(self.model_forward(*args, **kwargs))
[err, 2:16, 162s, g3115]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
[err, 2:16, 162s, g3115]     return func(*args, **kwargs)
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/transformers/utils/generic.py", line 965, in wrapper
[err, 2:16, 162s, g3115]     output = func(self, *args, **kwargs)
[err, 2:16, 162s, g3115]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/transformers/utils/deprecation.py", line 172, in wrapped_func
[err, 2:16, 162s, g3115]     return func(*args, **kwargs)
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 823, in forward
[err, 2:16, 162s, g3115]     outputs: BaseModelOutputWithPast = self.model(
[err, 2:16, 162s, g3115]                                        ^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
[err, 2:16, 162s, g3115]     return self._call_impl(*args, **kwargs)
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
[err, 2:16, 162s, g3115]     return forward_call(*args, **kwargs)
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/transformers/utils/generic.py", line 965, in wrapper
[err, 2:16, 162s, g3115]     output = func(self, *args, **kwargs)
[err, 2:16, 162s, g3115]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1380, in __call__
[err, 2:16, 162s, g3115]     return self._torchdynamo_orig_callable(
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1164, in __call__
[err, 2:16, 162s, g3115]     result = self._inner_convert(
[err, 2:16, 162s, g3115]              ^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 547, in __call__
[err, 2:16, 162s, g3115]     return _compile(
[err, 2:16, 162s, g3115]            ^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 986, in _compile
[err, 2:16, 162s, g3115]     guarded_code = compile_inner(code, one_graph, hooks, transform)
[err, 2:16, 162s, g3115]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 715, in compile_inner
[err, 2:16, 162s, g3115]     return _compile_inner(code, one_graph, hooks, transform)
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_utils_internal.py", line 95, in wrapper_function
[err, 2:16, 162s, g3115]     return function(*args, **kwargs)
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 750, in _compile_inner
[err, 2:16, 162s, g3115]     out_code = transform_code_object(code, transform)
[err, 2:16, 162s, g3115]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/bytecode_transformation.py", line 1361, in transform_code_object
[err, 2:16, 162s, g3115]     transformations(instructions, code_options)
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 231, in _fn
[err, 2:16, 162s, g3115]     return fn(*args, **kwargs)
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 662, in transform
[err, 2:16, 162s, g3115]     tracer.run()
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 2868, in run
[err, 2:16, 162s, g3115]     super().run()
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 1052, in run
[err, 2:16, 162s, g3115]     while self.step():
[err, 2:16, 162s, g3115]           ^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 962, in step
[err, 2:16, 162s, g3115]     self.dispatch_table[inst.opcode](self, inst)
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 657, in wrapper
[err, 2:16, 162s, g3115]     return handle_graph_break(self, inst, speculation.reason)
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 698, in handle_graph_break
[err, 2:16, 162s, g3115]     self.output.compile_subgraph(self, reason=reason)
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 1136, in compile_subgraph
[err, 2:16, 162s, g3115]     self.compile_and_call_fx_graph(
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 1382, in compile_and_call_fx_graph
[err, 2:16, 162s, g3115]     compiled_fn = self.call_user_compiler(gm)
[err, 2:16, 162s, g3115]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 1432, in call_user_compiler
[err, 2:16, 162s, g3115]     return self._call_user_compiler(gm)
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 1483, in _call_user_compiler
[err, 2:16, 162s, g3115]     raise BackendCompilerFailed(self.compiler_fn, e).with_traceback(
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 1462, in _call_user_compiler
[err, 2:16, 162s, g3115]     compiled_fn = compiler_fn(gm, self.example_inputs())
[err, 2:16, 162s, g3115]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 130, in __call__
[err, 2:16, 162s, g3115]     compiled_gm = compiler_fn(gm, example_inputs)
[err, 2:16, 162s, g3115]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/__init__.py", line 2340, in __call__
[err, 2:16, 162s, g3115]     return compile_fx(model_, inputs_, config_patches=self.config)
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1863, in compile_fx
[err, 2:16, 162s, g3115]     return aot_autograd(
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 83, in __call__
[err, 2:16, 162s, g3115]     cg = aot_module_simplified(gm, example_inputs, **self.kwargs)
[err, 2:16, 162s, g3115]          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1155, in aot_module_simplified
[err, 2:16, 162s, g3115]     compiled_fn = dispatch_and_compile()
[err, 2:16, 162s, g3115]                   ^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1131, in dispatch_and_compile
[err, 2:16, 162s, g3115]     compiled_fn, _ = create_aot_dispatcher_function(
[err, 2:16, 162s, g3115]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 580, in create_aot_dispatcher_function
[err, 2:16, 162s, g3115]     return _create_aot_dispatcher_function(
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 830, in _create_aot_dispatcher_function
[err, 2:16, 162s, g3115]     compiled_fn, fw_metadata = compiler_fn(
[err, 2:16, 162s, g3115]                                ^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 203, in aot_dispatch_base
[err, 2:16, 162s, g3115]     compiled_fw = compiler(fw_module, updated_flat_args)
[err, 2:16, 162s, g3115]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 489, in __call__
[err, 2:16, 162s, g3115]     return self.compiler_fn(gm, example_inputs)
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1741, in fw_compiler_base
[err, 2:16, 162s, g3115]     return inner_compile(
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 569, in compile_fx_inner
[err, 2:16, 162s, g3115]     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_dynamo/repro/after_aot.py", line 102, in debug_wrapper
[err, 2:16, 162s, g3115]     inner_compiled_fn = compiler_fn(gm, example_inputs)
[err, 2:16, 162s, g3115]                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 685, in _compile_fx_inner
[err, 2:16, 162s, g3115]     mb_compiled_graph = fx_codegen_and_compile(
[err, 2:16, 162s, g3115]                         ^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1129, in fx_codegen_and_compile
[err, 2:16, 162s, g3115]     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1044, in codegen_and_compile
[err, 2:16, 162s, g3115]     compiled_fn = graph.compile_to_module().call
[err, 2:16, 162s, g3115]                   ^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/graph.py", line 2027, in compile_to_module
[err, 2:16, 162s, g3115]     return self._compile_to_module()
[err, 2:16, 162s, g3115]            ^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/graph.py", line 2068, in _compile_to_module
[err, 2:16, 162s, g3115]     mod = PyCodeCache.load_by_key_path(
[err, 2:16, 162s, g3115]           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/codecache.py", line 2759, in load_by_key_path
[err, 2:16, 162s, g3115]     mod = _reload_python_module(key, path)
[err, 2:16, 162s, g3115]           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/runtime/compile_tasks.py", line 45, in _reload_python_module
[err, 2:16, 162s, g3115]     exec(code, mod.__dict__, mod.__dict__)
[err, 2:16, 162s, g3115]   File "/tmp/torchinductor_revr/fa/cfaf5f3d7s6g43rxlw4wh76xmjqkr37r3j6wa4w5tgidlfdxtsyk.py", line 48, in <module>
[err, 2:16, 162s, g3115]     triton_poi_fused_embedding_0 = async_compile.triton('triton_poi_fused_embedding_0', '''
[err, 2:16, 162s, g3115]                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/async_compile.py", line 213, in triton
[err, 2:16, 162s, g3115]     kernel.precompile()
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 293, in precompile
[err, 2:16, 162s, g3115]     compiled_binary, launcher = self._precompile_config(
[err, 2:16, 162s, g3115]                                 ^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 520, in _precompile_config
[err, 2:16, 162s, g3115]     binary._init_handles()
[err, 2:16, 162s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/triton/compiler/compiler.py", line 390, in _init_handles
[err, 2:16, 162s, g3115]     self.module, self.function, self.n_regs, self.n_spills = driver.active.utils.load_binary(
[err, 2:16, 162s, g3115]                                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:16, 162s, g3115] torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
[err, 2:16, 162s, g3115] SystemError: PY_SSIZE_T_CLEAN macro must be defined for '#' formats
[err, 2:16, 162s, g3115]
[err, 2:16, 162s, g3115] Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
[err, 2:16, 162s, g3115]
[err, 2:16, 162s, g3115]
[err, 2:16, 162s, g3115] You can suppress this exception and fall back to eager by setting:
[err, 2:16, 162s, g3115]     import torch._dynamo
[err, 2:16, 162s, g3115]     torch._dynamo.config.suppress_errors = True
[err, 2:16, 162s, g3115]
[out, 2:17, 164s, g3115] INFO:     127.0.0.1:36960 - "POST /close_communicator/ HTTP/1.1" 200 OK
[out, 2:22, 168s, g3115] [2025-04-11 22:08:36,407] [ERROR] [root.train:178] [PID:17959] Failed to train/fine-tune config 'gsm8k.yaml': Command '['accelerate', 'launch', '--num_processes', '1', '-m', 'axolotl.cli.train', 'gsm8k.yaml', '--debug-num-examples', '0']' returned non-zero exit status 1.
[err, 2:22, 168s, g3115] Traceback (most recent call last):
[err, 2:22, 168s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/bin/accelerate", line 8, in <module>
[err, 2:22, 168s, g3115]     sys.exit(main())
[err, 2:22, 168s, g3115]              ^^^^^^
[err, 2:22, 168s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/accelerate/commands/accelerate_cli.py", line 50, in main
[err, 2:22, 168s, g3115]     args.func(args)
[err, 2:22, 168s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/accelerate/commands/launch.py", line 1213, in launch_command
[err, 2:22, 168s, g3115]     simple_launcher(args)
[err, 2:22, 168s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/accelerate/commands/launch.py", line 795, in simple_launcher
[err, 2:22, 168s, g3115]     raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
[err, 2:22, 168s, g3115] subprocess.CalledProcessError: Command '['/gscratch/clmbr/revr/LRMGraph/axolotl_env/bin/python', '-m', 'axolotl.cli.train', 'gsm8k.yaml', '--debug-num-examples', '0']' returned non-zero exit status 1.
[err, 2:22, 168s, g3115] Traceback (most recent call last):
[err, 2:22, 168s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/bin/axolotl", line 8, in <module>
[err, 2:22, 168s, g3115]     sys.exit(main())
[err, 2:22, 168s, g3115]              ^^^^^^
[err, 2:22, 168s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/axolotl/cli/main.py", line 337, in main
[err, 2:22, 168s, g3115]     cli()
[err, 2:22, 168s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/click/core.py", line 1161, in __call__
[err, 2:22, 168s, g3115]     return self.main(*args, **kwargs)
[err, 2:22, 168s, g3115]            ^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:22, 168s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/click/core.py", line 1082, in main
[err, 2:22, 168s, g3115]     rv = self.invoke(ctx)
[err, 2:22, 168s, g3115]          ^^^^^^^^^^^^^^^^
[err, 2:22, 168s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/click/core.py", line 1697, in invoke
[err, 2:22, 168s, g3115]     return _process_result(sub_ctx.command.invoke(sub_ctx))
[err, 2:22, 168s, g3115]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:22, 168s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/click/core.py", line 1443, in invoke
[err, 2:22, 168s, g3115]     return ctx.invoke(self.callback, **ctx.params)
[err, 2:22, 168s, g3115]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:22, 168s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/click/core.py", line 788, in invoke
[err, 2:22, 168s, g3115]     return __callback(*args, **kwargs)
[err, 2:22, 168s, g3115]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:22, 168s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/axolotl/cli/utils.py", line 66, in wrapper
[err, 2:22, 168s, g3115]     return func(*args, **filtered_kwargs)
[err, 2:22, 168s, g3115]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:22, 168s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/axolotl/cli/main.py", line 180, in train
[err, 2:22, 168s, g3115]     raise exc
[err, 2:22, 168s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/site-packages/axolotl/cli/main.py", line 165, in train
[err, 2:22, 168s, g3115]     subprocess.run(cmd, check=True)  # nosec B603
[err, 2:22, 168s, g3115]     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[err, 2:22, 168s, g3115]   File "/gscratch/clmbr/revr/LRMGraph/axolotl_env/lib/python3.12/subprocess.py", line 573, in run
[err, 2:22, 168s, g3115]     raise CalledProcessError(retcode, process.args,
[err, 2:22, 168s, g3115] subprocess.CalledProcessError: Command '['accelerate', 'launch', '--num_processes', '1', '-m', 'axolotl.cli.train', 'gsm8k.yaml', '--debug-num-examples', '0']' returned non-zero exit status 1.

RevanthRameshkumar avatar Apr 20 '25 03:04 RevanthRameshkumar

@RevanthRameshkumar could you try run with torch_compile: False?

SalmanMohammadi avatar Apr 20 '25 09:04 SalmanMohammadi