axolotl icon indicating copy to clipboard operation
axolotl copied to clipboard

GRPO training calling DPO dataset processing logic

Open lulmer opened this issue 5 months ago • 22 comments

Please check that this issue hasn't been reported before.

  • [x] I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

GRPO is different from DPO, This is not a proxy RL task through a preference dataset with SFT, there is no notion of chosen/rejected so if the rl: grpo flag is specified there is no reason axolotl ever calls any module from dpo and raises an error because the chosen field is missing (we don't need it in this context). If I have a conversation dataset type column (with no assistant response), and the verifiable responses in dedicated columns (the rewards func have theses names are arguments) it should just process and work.

Current behaviour

Hello everyone,

I am new to Axolotl and I try to setup a GRPO pipeline with a dataset and reward func that are already working fine with pure hf TRL or Unsloth.

When I launch the training I get this error :

Mapping RL Dataset (num_proc=32):   0%|                                                                                    | 0/481 [00:02<?, ? examples/s]
multiprocess.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "xxx/axolotl_toolkit/.venv/lib/python3.12/site-packages/multiprocess/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
                    ^^^^^^^^^^^^^^^^^^^
  File "xxx/axolotl_toolkit/.venv/lib/python3.12/site-packages/datasets/utils/py_utils.py", line 688, in _write_generator_to_queue
    for i, result in enumerate(func(**kwargs)):
  File "xxx/axolotl_toolkit/.venv/lib/python3.12/site-packages/datasets/arrow_dataset.py", line 3501, in _map_single
    for i, example in iter_outputs(shard_iterable):
  File "xxx/axolotl_toolkit/.venv/lib/python3.12/site-packages/datasets/arrow_dataset.py", line 3475, in iter_outputs
    yield i, apply_function(example, i, offset=offset)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "xxx/axolotl_toolkit/.venv/lib/python3.12/site-packages/datasets/arrow_dataset.py", line 3398, in apply_function
    processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "xxx/axolotl_toolkit/.venv/lib/python3.12/site-packages/axolotl/prompt_strategies/dpo/chat_template.py", line 65, in transform_fn
    chosen_raw = sample[field_chosen]
                 ~~~~~~^^^^^^^^^^^^^^
  File "xxx/axolotl_toolkit/.venv/lib/python3.12/site-packages/datasets/formatting/formatting.py", line 278, in __getitem__
    value = self.data[key]
            ~~~~~~~~~^^^^^
KeyError: 'chosen'

Steps to reproduce

Copy paste the GRPO config from https://github.com/axolotl-ai-cloud/grpo_code/blob/main/r1_acecode.yaml and try to may it work with a conversation dataset format (with system and user prompts but no assistant response)

Config yaml

rl: grpo
trl:
  beta: 0.001    
  use_vllm: true
  vllm_server_host: 0.0.0.0
  vllm_server_port: 8000
  vllm_server_timeout: 300
  reward_funcs:
    - reward_functions.autocoder.dynamicity_reward_func
    - reward_functions.autocoder.do_execute_reward_func
    - reward_functions.autocoder.overlap_similarity_reward_func
    - reward_functions.autocoder.difference_similarity_reward_func

  num_generations: 8
  max_completion_length: 20000
  log_completions: false

chat_template: qwen_25
datasets:
  - path: dataset/grpo-dataset-sample-v4-no-cot.json
    ds_type: json
    split: train
    type: chat_template
    field_messages: prompt

Possible solution

No response

Which Operating Systems are you using?

  • [x] Linux
  • [ ] macOS
  • [ ] Windows

Python Version

3.12

axolotl branch-commit

0.11.0

Acknowledgements

  • [x] My issue title is concise, descriptive, and in title casing.
  • [x] I have searched the existing issues to make sure this bug has not been reported yet.
  • [x] I am using the latest version of axolotl.
  • [x] I have provided enough information for the maintainers to reproduce and diagnose the issue.

lulmer avatar Jul 28 '25 09:07 lulmer

Hey, thanks for the Issue. One thing I noticed was that, the type: chat_template.

In the linked example, we pointed to a new transform https://github.com/axolotl-ai-cloud/grpo_code/blob/148ea79321f34bbed79b3b55f04c0a7de002665d/grpo_code/transforms.py#L34 , which properly loads the correct transformation.

We currently don't have built-in GRPO transforms. I suspect that, it's auto-defaulting to DPO's implementation https://docs.axolotl.ai/docs/rlhf.html#chat_template.default

Could you make sure you have the appropriate transform?

NanoCode012 avatar Jul 29 '25 12:07 NanoCode012

Hey ! Thank you for the quick reply, it took me a little bit of time to adjust my env etc. Following your recommandation I created the following transform (which purposely does nothing) :

def grpo_transform(cfg, *args, **kwargs):

    def transform_fn(example, tokenizer=None):
        return example

    return transform_fn

And It seems to solve the problem as the DPO processing functions aren't called anymore. Now with the same config I have an error on the axolotl vllm-serve command.

Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "xxx/axolotl_toolkit/.venv/lib/python3.12/site-packages/axolotl/cli/vllm_serve.py", line 100, in llm_worker
    llm = LLM(
          ^^^^
  File "xxx/axolotl_toolkit/.venv/lib/python3.12/site-packages/vllm/entrypoints/llm.py", line 245, in __init__
    engine_args = EngineArgs(
                  ^^^^^^^^^^^
TypeError: EngineArgs.__init__() got an unexpected keyword argument 'enable_reasoning'

As I don't want to train a reasoning model I tried to set enable_reasoning: false in the config but this doesn't work.

lulmer avatar Jul 31 '25 13:07 lulmer

Which model is this? Does vllm's EngineArgs support that param?

NanoCode012 avatar Jul 31 '25 13:07 NanoCode012

It's a qwen-coder-2.5-7b, normally the model it is not a reasoning model so I don't understand why this parameter exists and why it is set to true.

lulmer avatar Aug 01 '25 07:08 lulmer

Thanks, can you try set the below to None

https://github.com/axolotl-ai-cloud/axolotl/blob/7026cd5e9e053d51aa271c1f57f62950bcdc599f/src/axolotl/cli/vllm_serve.py#L65-L67

or alternatively, just delete this line:

https://github.com/axolotl-ai-cloud/axolotl/blob/7026cd5e9e053d51aa271c1f57f62950bcdc599f/src/axolotl/cli/vllm_serve.py#L81

NanoCode012 avatar Aug 01 '25 07:08 NanoCode012

Ok I tried the first solution you proposed, now the error is gone. Now I am blocked waiting forever for the CUDA graph capture to finish..

$ CUDA_VISIBLE_DEVICES=7 axolotl vllm-serve configs/autocode
r_grpo.yaml

     #@@ #@@      @@# @@#
    @@  @@          @@  @@           =@@#                               @@                 #@    =@@#.
    @@    #@@@@@@@@@    @@           #@#@=                              @@                 #@     .=@@
      #@@@@@@@@@@@@@@@@@            =@# @#     ##=     ##    =####=+    @@      =#####+  =#@@###.   @@
    @@@@@@@@@@/  +@@/  +@@          #@  =@=     #@=   @@   =@#+  +#@#   @@    =@#+  +#@#   #@.      @@
    @@@@@@@@@@  ##@@  ##@@         =@#   @#      =@# @#    @@      @@   @@    @@      #@   #@       @@
     @@@@@@@@@@@@@@@@@@@@          #@=+++#@=      =@@#     @@      @@   @@    @@      #@   #@       @@
                                  =@#=====@@     =@# @#    @@      @@   @@    @@      #@   #@       @@
    @@@@@@@@@@@@@@@@  @@@@        #@      #@=   #@=  +@@   #@#    =@#   @@.   =@#    =@#   #@.      @@
                                 =@#       @#  #@=     #@   =#@@@@#=    +#@@=  +#@@@@#=    .##@@+   @@
    @@@@  @@@@@@@@@@@@@@@@

[2025-08-01 08:08:26,073] [INFO] [axolotl.cli.config.load_cfg:244] [PID:3348451] [RANK:0] config:
{
  "activation_offloading": false,
  "axolotl_config_path": "configs/autocoder_grpo.yaml",
  "base_model": "/data/checkpoints/Qwen2.5-Coder-7B-AutoCoderV2-SFT-FP16",
  "base_model_config": "/data/checkpoints/Qwen2.5-Coder-7B-AutoCoderV2-SFT-FP16",
  "batch_size": 64,
  "bf16": true,
  "capabilities": {
    "bf16": true,
    "compute_capability": "sm_100",
    "fp8": false,
    "n_gpu": 1,
    "n_node": 1
  },
  "chat_template": "qwen_25",
  "dataloader_num_workers": 2,
  "dataloader_pin_memory": true,
  "dataloader_prefetch_factor": 32,
  "dataset_processes": 224,
  "datasets": [
    {
      "ds_type": "json",
      "field_messages": "prompt",
      "message_property_mappings": {
        "content": "content",
        "role": "role"
      },
      "path": "xxx/grpo-dataset-sample-v4-no-cot.json",
      "split": "train",
      "trust_remote_code": false,
      "type": "grpo_chat_template_transform.grpo_transform"
    }
  ],
  "ddp": false,
  "device": "cuda:0",
  "device_map": "auto",
  "env_capabilities": {
    "torch_version": "2.7.1"
  },
  "eval_batch_size": 32,
  "eval_causal_lm_metrics": [
    "sacrebleu",
    "comet",
    "ter",
    "chrf"
  ],
  "eval_max_new_tokens": 128,
  "eval_sample_packing": false,
  "eval_table_size": 0,
  "evals_per_epoch": 0,
  "flash_attention": false,
  "fp16": false,
  "gc_steps": 1,
  "gradient_accumulation_steps": 2,
  "gradient_checkpointing": true,
  "gradient_checkpointing_kwargs": {
    "use_reentrant": false
  },
  "group_by_length": false,
  "learning_rate": 5.3e-06,
  "lisa_layers_attribute": "model.layers",
  "load_best_model_at_end": false,
  "load_in_4bit": false,
  "load_in_8bit": false,
  "local_rank": 0,
  "logging_steps": 1,
  "lora_dropout": 0.0,
  "loraplus_lr_embedding": 1e-06,
  "lr_scheduler": "warmup_stable_decay",
  "lr_scheduler_kwargs": {
    "min_lr_ratio": 0.1,
    "num_cycles": 0.5,
    "num_decay_steps": 500,
    "num_stable_steps": 1500
  },
  "max_grad_norm": 1.0,
  "max_prompt_len": 512,
  "max_steps": 2500,
  "mean_resizing_embeddings": false,
  "micro_batch_size": 32,
  "model_config_type": "qwen2",
  "num_epochs": 1.0,
  "optimizer": "adamw_torch_fused",
  "output_dir": "./model-out",
  "pad_to_sequence_len": false,
  "pretrain_multipack_attn": true,
  "pretrain_multipack_buffer_size": 10000,
  "profiler_steps_start": 0,
  "qlora_sharded_model_loading": false,
  "ray_num_workers": 1,
  "resources_per_worker": {
    "GPU": 1
  },
  "rl": "grpo",
  "sample_packing": false,
  "sample_packing_bin_size": 200,
  "sample_packing_group_size": 100000,
  "save_only_model": false,
  "save_safetensors": true,
  "save_steps": 0.5,
  "saves_per_epoch": 0,
  "sequence_len": 1024,
  "sequence_parallel_degree": 1,
  "shuffle_merged_datasets": true,
  "skip_prepare_dataset": false,
  "strict": false,
  "tensor_parallel_size": 1,
  "tf32": true,
  "tiled_mlp_use_original_mlp": true,
  "tokenizer_config": "/data/checkpoints/WSD-Qwen2.5-Coder-7B-AutoCoderV2-SFT-FP16",
  "torch_compile": true,
  "torch_dtype": "torch.bfloat16",
  "train_on_inputs": false,
  "trl": {
    "beta": 0.001,
    "log_completions": false,
    "mask_truncated_completions": false,
    "max_completion_length": 20000,
    "num_generations": 8,
    "ref_model_mixup_alpha": 0.9,
    "ref_model_sync_steps": 64,
    "reward_funcs": [
      "reward_functions.autocoder.dynamicity_reward_func",
      "reward_functions.autocoder.do_execute_reward_func",
      "reward_functions.autocoder.overlap_similarity_reward_func",
      "reward_functions.autocoder.difference_similarity_reward_func"
    ],
    "scale_rewards": true,
    "sync_ref_model": false,
    "use_vllm": true,
    "vllm_server_host": "0.0.0.0",
    "vllm_server_port": 8000,
    "vllm_server_timeout": 300
  },
  "use_ray": false,
  "val_set_size": 0.0,
  "vllm": {
    "data_parallel_size": 1,
    "device": "auto",
    "dtype": "auto",
    "gpu_memory_utilization": 0.85,
    "host": "0.0.0.0",
    "port": 8000,
    "tensor_parallel_size": 1
  },
  "warmup_steps": 500,
  "weight_decay": 0.0,
  "world_size": 1
}
WARNING 08-01 08:08:38 [topk_topp_sampler.py:59] FlashInfer is not available. Falling back to the PyTorch-native implementation of top-p & top-k sampling. For the best performance, please install FlashInfer.
WARNING 08-01 08:08:38 [cuda.py:280] FlashInfer failed to import for V1 engine on Blackwell (SM 10.0) GPUs; it is recommended to install FlashInfer for better performance.
Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:00,  5.92it/s]
Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:00<00:01,  1.94it/s]
Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:01<00:00,  1.62it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.42it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.60it/s]

Capturing CUDA graph shapes: 100%|██████████████████████████████████████████████████████████████████████████████| 67/67 [00:01<00:00, 35.32it/s]

lulmer avatar Aug 01 '25 08:08 lulmer

I haven't seen that CUDA graph log before. I'll ask the team.

In meantime, where are you running this? Runpod? Locally?

NanoCode012 avatar Aug 01 '25 08:08 NanoCode012

Locally on a DGX-B200, with cuda 12.8 and latest vllm, if it can help here is the versions I have in my env :

$ uv pip list
Package                                  Version                 Editable project location
---------------------------------------- ----------------------- -------------------------------------------------------
absl-py                                  2.3.1
accelerate                               1.9.0
addict                                   2.4.0
adlfs                                    2024.12.0
aiobotocore                              2.23.2
aiofiles                                 23.2.1
aiohappyeyeballs                         2.6.1
aiohttp                                  3.12.14
aioitertools                             0.12.0
aiosignal                                1.4.0
airportsdata                             20250706
annotated-types                          0.7.0
antlr4-python3-runtime                   4.13.2
anyio                                    4.9.0
art                                      6.5
astor                                    0.8.1
attrs                                    25.3.0
autoawq                                  0.2.7.post3
awscli                                   1.41.13
axolotl                                  0.12.0.dev0             /home/lulmer/wsd-ai-ml-training/axolotl_toolkit/axolotl
axolotl-contribs-lgpl                    0.0.6
axolotl-contribs-mit                     0.0.3
azure-core                               1.35.0
azure-datalake-store                     0.0.53
azure-identity                           1.23.1
azure-storage-blob                       12.26.0
bitsandbytes                             0.46.0
blake3                                   1.0.5
botocore                                 1.39.8
cachetools                               5.5.2
cbor2                                    5.6.5
certifi                                  2025.7.14
cffi                                     1.17.1
chardet                                  5.2.0
charset-normalizer                       3.4.2
circuitbreaker                           2.1.3
click                                    8.1.8
cloudpickle                              3.1.1
colorama                                 0.4.6
coloredlogs                              15.0.1
compressed-tensors                       0.10.2
cryptography                             44.0.3
cupy-cuda12x                             13.5.1
dataproperty                             1.1.0
datasets                                 4.0.0
decorator                                5.2.1
deprecated                               1.2.18
depyf                                    0.19.0
dill                                     0.3.8
diskcache                                5.6.3
distro                                   1.9.0
dnspython                                2.7.0
docutils                                 0.19
einops                                   0.8.1
email-validator                          2.2.0
evaluate                                 0.4.1
fastapi                                  0.116.1
fastapi-cli                              0.0.8
fastapi-cloud-cli                        0.1.4
fastcore                                 1.8.6
fastrlock                                0.8.3
ffmpy                                    0.6.1
filelock                                 3.18.0
fire                                     0.7.0
frozenlist                               1.7.0
fsspec                                   2025.3.0
gcsfs                                    2025.3.0
gguf                                     0.17.1
gitdb                                    4.0.12
gitpython                                3.1.45
google-api-core                          2.25.1
google-auth                              2.40.3
google-auth-oauthlib                     1.2.2
google-cloud-core                        2.4.3
google-cloud-storage                     3.2.0
google-crc32c                            1.7.1
google-resumable-media                   2.7.2
googleapis-common-protos                 1.70.0
gradio                                   5.23.3
gradio-client                            1.8.0
groovy                                   0.1.2
grpcio                                   1.74.0
grpclib                                  0.4.7
h11                                      0.16.0
h2                                       4.2.0
hf-transfer                              0.1.9
hf-xet                                   1.1.2
hpack                                    4.1.0
httpcore                                 1.0.9
httptools                                0.6.4
httpx                                    0.28.1
huggingface-hub                          0.33.5
humanfriendly                            10.0
hyperframe                               6.1.0
idna                                     3.10
immutabledict                            4.2.0
importlib-metadata                       8.0.0
interegular                              0.3.3
isodate                                  0.7.2
jinja2                                   3.1.6
jiter                                    0.10.0
jmespath                                 1.0.1
joblib                                   1.5.1
jsonlines                                4.0.0
jsonschema                               4.25.0
jsonschema-specifications                2025.4.1
langdetect                               1.0.9
lark                                     1.2.2
liger-kernel                             0.6.0
llguidance                               0.7.30
llvmlite                                 0.44.0
lm-eval                                  0.4.7
lm-format-enforcer                       0.10.11
lxml                                     6.0.0
markdown                                 3.8.2
markdown-it-py                           3.0.0
markupsafe                               3.0.2
mbstrdecoder                             1.1.4
mdurl                                    0.1.2
mistral-common                           1.7.0
modal                                    1.0.2
more-itertools                           10.7.0
mpmath                                   1.3.0
msal                                     1.33.0
msal-extensions                          1.3.1
msgpack                                  1.1.1
msgspec                                  0.19.0
multidict                                6.6.3
multiprocess                             0.70.16
nest-asyncio                             1.6.0
networkx                                 3.5
ninja                                    1.11.1.4
nltk                                     3.9.1
numba                                    0.61.2
numexpr                                  2.11.0
numpy                                    2.0.1
nvidia-cublas-cu12                       12.8.3.14
nvidia-cuda-cupti-cu12                   12.8.57
nvidia-cuda-nvrtc-cu12                   12.8.61
nvidia-cuda-runtime-cu12                 12.8.57
nvidia-cudnn-cu12                        9.7.1.26
nvidia-cufft-cu12                        11.3.3.41
nvidia-cufile-cu12                       1.13.0.11
nvidia-curand-cu12                       10.3.9.55
nvidia-cusolver-cu12                     11.7.2.55
nvidia-cusparse-cu12                     12.5.7.53
nvidia-cusparselt-cu12                   0.6.3
nvidia-ml-py                             12.560.30
nvidia-nccl-cu12                         2.26.2
nvidia-nvjitlink-cu12                    12.8.61
nvidia-nvtx-cu12                         12.8.55
oauthlib                                 3.3.1
oci                                      2.156.0
ocifs                                    1.3.2
openai                                   1.90.0
opencv-python-headless                   4.12.0.88
opentelemetry-api                        1.26.0
opentelemetry-exporter-otlp              1.26.0
opentelemetry-exporter-otlp-proto-common 1.26.0
opentelemetry-exporter-otlp-proto-grpc   1.26.0
opentelemetry-exporter-otlp-proto-http   1.26.0
opentelemetry-proto                      1.26.0
opentelemetry-sdk                        1.26.0
opentelemetry-semantic-conventions       0.47b0
opentelemetry-semantic-conventions-ai    0.4.11
optimum                                  1.16.2
orjson                                   3.11.0
outlines                                 0.1.11
outlines-core                            0.2.10
packaging                                23.2
pandas                                   2.3.1
partial-json-parser                      0.2.1.1.post6
pathvalidate                             3.3.1
peft                                     0.16.0
pillow                                   11.3.0
platformdirs                             4.3.8
portalocker                              3.2.0
prometheus-client                        0.22.1
prometheus-fastapi-instrumentator        7.1.0
propcache                                0.3.2
proto-plus                               1.26.1
protobuf                                 6.31.1
psutil                                   7.0.0
py-cpuinfo                               9.0.0
pyarrow                                  21.0.0
pyasn1                                   0.6.1
pyasn1-modules                           0.4.2
pybase64                                 1.4.1
pybind11                                 3.0.0
pycountry                                24.6.1
pycparser                                2.22
pydantic                                 2.10.6
pydantic-core                            2.27.2
pydantic-extra-types                     2.10.5
pydub                                    0.25.1
pygments                                 2.19.2
pyjwt                                    2.10.1
pyopenssl                                24.3.0
pytablewriter                            1.2.1
python-dateutil                          2.9.0.post0
python-dotenv                            1.0.1
python-json-logger                       3.3.0
python-multipart                         0.0.20
pytz                                     2025.2
pyyaml                                   6.0.2
pyzmq                                    27.0.0
ray                                      2.48.0
referencing                              0.36.2
regex                                    2024.11.6
requests                                 2.32.4
requests-oauthlib                        2.0.0
responses                                0.18.0
rich                                     14.1.0
rich-toolkit                             0.14.8
rignore                                  0.6.4
rouge-score                              0.1.2
rpds-py                                  0.26.0
rsa                                      4.7.2
ruff                                     0.12.5
s3fs                                     2025.3.0
s3transfer                               0.13.1
sacrebleu                                2.5.1
safehttpx                                0.1.6
safetensors                              0.5.3
schedulefree                             1.4.1
scikit-learn                             1.4.2
scipy                                    1.16.0
semantic-version                         2.10.0
sentencepiece                            0.2.0
sentry-sdk                               2.33.2
setproctitle                             1.3.6
setuptools                               79.0.1
shellingham                              1.5.4
sigtools                                 4.0.1
six                                      1.17.0
smmap                                    5.0.2
sniffio                                  1.3.1
soundfile                                0.13.1
soxr                                     0.5.0.post1
sqlitedict                               2.1.0
starlette                                0.47.2
sympy                                    1.13.3
synchronicity                            0.9.16
tabledata                                1.3.4
tabulate                                 0.9.0
tcolorpy                                 0.1.7
tensorboard                              2.20.0
tensorboard-data-server                  0.7.2
termcolor                                3.1.0
threadpoolctl                            3.6.0
tiktoken                                 0.9.0
tokenizers                               0.21.2
toml                                     0.10.2
tomlkit                                  0.13.3
torch                                    2.7.1+cu128
torchao                                  0.12.0
torchaudio                               2.7.1+cu128
torchvision                              0.22.1+cu128
tqdm                                     4.67.1
tqdm-multiprocess                        0.0.11
transformers                             4.53.2
triton                                   3.3.1
trl                                      0.19.1
typepy                                   1.3.4
typer                                    0.16.0
types-certifi                            2021.10.8.3
types-toml                               0.10.8.20240310
typing-extensions                        4.14.1
typing-inspection                        0.4.1
tzdata                                   2025.2
urllib3                                  2.5.0
uvicorn                                  0.35.0
uvloop                                   0.21.0
vllm                                     0.10.1.dev59+g396ee9418
wandb                                    0.21.0
watchfiles                               1.1.0
websockets                               15.0.1
werkzeug                                 3.1.3
wheel                                    0.45.1
word2number                              1.1
wrapt                                    1.17.2
xformers                                 0.0.29.post3
xgrammar                                 0.1.21
xxhash                                   3.5.0
yarl                                     1.20.1
zipp                                     3.23.0
zstandard                                0.22.0

lulmer avatar Aug 01 '25 08:08 lulmer

Just to verify, are you able to run, vllm serve ... to see if it's a vllm issue or axolotl issue?

NanoCode012 avatar Aug 01 '25 08:08 NanoCode012

vllm serve <my_model> works flawlessly

lulmer avatar Aug 01 '25 08:08 lulmer

vllm serve <my_model> works flawlessly

Same with CUDA_VISIBLE_DEVICES=7 prepended?

NanoCode012 avatar Aug 01 '25 09:08 NanoCode012

If vllm-serve works, can you just leave that up and run the axolotl train command

NanoCode012 avatar Aug 01 '25 09:08 NanoCode012

Thank you for the help, now I tried to launch the training and I have a weird error at the beginning of the forward pass.

  File "/.venv/lib/python3.12/site-packages/trl/trainer/grpo_trainer.py", line 1161, in _generate_and_score_completions
    completion_ids = [torch.tensor(ids, device=device) for ids in completion_ids]
                                                                  ^^^^^^^^^^^^^^
UnboundLocalError: cannot access local variable 'completion_ids' where it is not associated with a value

lulmer avatar Aug 01 '25 10:08 lulmer

@lulmer did you by any chance explicitly set trl.vllm_mode to one of server or colocate ? looking in TRL, if it's set to None, it ends up never defining completion_ids

trl:
  vllm_mode: colocate

winglian avatar Aug 01 '25 12:08 winglian

let me know if https://github.com/axolotl-ai-cloud/axolotl/pull/2998 helps with the completion_ids issue

winglian avatar Aug 01 '25 12:08 winglian

Interesting ! So pulled the latest version of axolotl with your PR included and tried to run with :

trl:
  vllm_mode: colocate

This gives me a CUDA oom error (although 180) : ```bash [rank0]: torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 272.00 MiB. GPU 0 has a total capacity of 178.36 GiB of which 258.56 MiB is free. Including non-PyTorch memory, this process has 178.10 GiB memory in use. Of the allocated memory 177.13 GiB is allocated by PyTorch, with 39.50 MiB allocated in private pools (e.g., CUDA Graphs), and 30.65 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)



I also tried server mode : 
```bash
trl:
  vllm_mode: server

and got these :

    raise Exception(f"Request failed: {response.status_code}, {response.text}")
Exception: Request failed: 404, {"detail":"Not Found"}

On the vllm serve terminal I can see that axolotl tries to call this endpoint INFO: 127.0.0.1:39030 - "GET /get_world_size/ HTTP/1.1" 404 Not Found

lulmer avatar Aug 01 '25 14:08 lulmer

dug into the initial issue around the DPO loader and the reason this is happening is that the type is chat_template. If you look at this example, typically the type handler is just formatting the dataset to the chat message format. If you could provide a sample row of data, we can help figure out a way to make the transform function work.

One thing if colocate mode works for you (don't need to spin up a separate vllm server) would be something like this example: https://github.com/axolotl-ai-cloud/axolotl-cookbook/blob/main/grpo/gsm8k.yaml#L12-L17

winglian avatar Aug 01 '25 17:08 winglian

Thank you guys for helping me ! So as I said I purposely set an empty transform because I have already preprocessed the dataset in an external script. The dataset already contains a "prompt" field that has two entries, here is a simplified example to give you a feel of what its looks like :

prompt  = [
{'role':'system', 'content':'You are a coding model tasked to convert a static code to a dynamic code, here is an explainer of the syntax you can use [...]'},
{'role':'user', 'content':' <The Static Code [...]'> }] 

It is worth mentioning that I discarded very long examples of code so I never get huge prompts (20k cutoff limit).

I am also wondering if axolotl has been properly tested on Blackwell architectures.

lulmer avatar Aug 04 '25 09:08 lulmer

I am also wondering if axolotl has been properly tested on Blackwell architectures.

I have successfully done GRPO training on the B200. I used the main branch of Axolotl with PyTorch 2.7.1 and vLLM 0.10.0.

Here is the result of conda list.

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
absl-py                   2.3.1                    pypi_0    pypi
accelerate                1.10.0.dev0              pypi_0    pypi
addict                    2.4.0                    pypi_0    pypi
adlfs                     2024.12.0                pypi_0    pypi
aiobotocore               2.23.2                   pypi_0    pypi
aiofiles                  23.2.1                   pypi_0    pypi
aiohappyeyeballs          2.6.1                    pypi_0    pypi
aiohttp                   3.12.15                  pypi_0    pypi
aioitertools              0.12.0                   pypi_0    pypi
aiosignal                 1.4.0                    pypi_0    pypi
alsa-lib                  1.2.14               hb9d3cd8_0    conda-forge
annotated-types           0.7.0                    pypi_0    pypi
antlr4-python3-runtime    4.13.2                   pypi_0    pypi
anyio                     4.9.0                    pypi_0    pypi
apollo-torch              1.0.3                    pypi_0    pypi
art                       6.5                      pypi_0    pypi
astor                     0.8.1                    pypi_0    pypi
attr                      2.5.1                h166bdaf_1    conda-forge
attrs                     25.3.0                   pypi_0    pypi
autoawq                   0.2.7.post3              pypi_0    pypi
axolotl                   0.12.0.dev0              pypi_0    pypi
axolotl-contribs-lgpl     0.0.6                    pypi_0    pypi
axolotl-contribs-mit      0.0.3                    pypi_0    pypi
azure-core                1.35.0                   pypi_0    pypi
azure-datalake-store      0.0.53                   pypi_0    pypi
azure-identity            1.23.1                   pypi_0    pypi
azure-storage-blob        12.26.0                  pypi_0    pypi
binutils                  2.44                 h4852527_1    conda-forge
binutils_impl_linux-64    2.44                 h4bf12b8_1    conda-forge
binutils_linux-64         2.44                 h4852527_1    conda-forge
bitsandbytes              0.46.0                   pypi_0    pypi
blake3                    1.0.5                    pypi_0    pypi
botocore                  1.39.8                   pypi_0    pypi
bzip2                     1.0.8                h4bc722e_7    conda-forge
c-compiler                1.11.0               h4d9bdce_0    conda-forge
ca-certificates           2025.7.14            hbd8a1cb_0    conda-forge
cachetools                5.5.2                    pypi_0    pypi
came-pytorch              0.1.3                    pypi_0    pypi
cbor2                     5.6.5                    pypi_0    pypi
certifi                   2025.7.14                pypi_0    pypi
cffi                      1.17.1                   pypi_0    pypi
chardet                   5.2.0                    pypi_0    pypi
charset-normalizer        3.4.2                    pypi_0    pypi
circuitbreaker            2.1.3                    pypi_0    pypi
click                     8.1.8                    pypi_0    pypi
cloudpickle               3.1.1                    pypi_0    pypi
cmake                     4.0.3                    pypi_0    pypi
colorama                  0.4.6                    pypi_0    pypi
coloredlogs               15.0.1                   pypi_0    pypi
compressed-tensors        0.10.2                   pypi_0    pypi
conda-gcc-specs           14.3.0               hb991d5c_4    conda-forge
cryptography              44.0.3                   pypi_0    pypi
cuda-bindings             12.9.0                   pypi_0    pypi
cuda-cccl_linux-64        12.8.90              ha770c72_1    conda-forge
cuda-command-line-tools   12.8.1               ha770c72_0    conda-forge
cuda-compiler             12.8.1               hbad6d8a_0    conda-forge
cuda-crt-dev_linux-64     12.8.93              ha770c72_3    conda-forge
cuda-crt-tools            12.8.93              ha770c72_3    conda-forge
cuda-cudart               12.8.90              h5888daf_1    conda-forge
cuda-cudart-dev           12.8.90              h5888daf_1    conda-forge
cuda-cudart-dev_linux-64  12.8.90              h3f2d84a_1    conda-forge
cuda-cudart-static        12.8.90              h5888daf_1    conda-forge
cuda-cudart-static_linux-64 12.8.90              h3f2d84a_1    conda-forge
cuda-cudart_linux-64      12.8.90              h3f2d84a_1    conda-forge
cuda-cuobjdump            12.8.90              hbd13f7d_1    conda-forge
cuda-cupti                12.8.90              h5888daf_1    conda-forge
cuda-cupti-dev            12.8.90              h5888daf_1    conda-forge
cuda-cuxxfilt             12.8.90              hbd13f7d_1    conda-forge
cuda-driver-dev           12.8.90              h5888daf_1    conda-forge
cuda-driver-dev_linux-64  12.8.90              h3f2d84a_1    conda-forge
cuda-gdb                  12.8.90              ha677faa_1    conda-forge
cuda-libraries            12.8.1               ha770c72_0    conda-forge
cuda-libraries-dev        12.8.1               ha770c72_0    conda-forge
cuda-nsight               12.8.90              h7938cbb_1    conda-forge
cuda-nvcc                 12.8.93              hcdd1206_2    conda-forge
cuda-nvcc-dev_linux-64    12.8.93              he91c749_3    conda-forge
cuda-nvcc-impl            12.8.93              h85509e4_3    conda-forge
cuda-nvcc-tools           12.8.93              he02047a_3    conda-forge
cuda-nvcc_linux-64        12.8.93              he0b4e1d_2    conda-forge
cuda-nvdisasm             12.8.90              hbd13f7d_1    conda-forge
cuda-nvml-dev             12.8.90              hbd13f7d_1    conda-forge
cuda-nvprof               12.8.90              hcf8d014_1    conda-forge
cuda-nvprune              12.8.90              hbd13f7d_1    conda-forge
cuda-nvrtc                12.8.93              h5888daf_1    conda-forge
cuda-nvrtc-dev            12.8.93              h5888daf_1    conda-forge
cuda-nvtx                 12.8.90              h5888daf_1    conda-forge
cuda-nvvm-dev_linux-64    12.8.93              ha770c72_3    conda-forge
cuda-nvvm-impl            12.8.93              he02047a_3    conda-forge
cuda-nvvm-tools           12.8.93              he02047a_3    conda-forge
cuda-nvvp                 12.8.93              hbd13f7d_1    conda-forge
cuda-opencl               12.8.90              h5888daf_1    conda-forge
cuda-opencl-dev           12.8.90              h5888daf_1    conda-forge
cuda-profiler-api         12.8.90              h7938cbb_1    conda-forge
cuda-python               12.9.0                   pypi_0    pypi
cuda-sanitizer-api        12.8.93              hbd13f7d_1    conda-forge
cuda-toolkit              12.8.1                        0    nvidia/label/cuda-12.8.1
cuda-tools                12.8.1               ha770c72_0    conda-forge
cuda-version              12.8                 h5d125a7_3    conda-forge
cuda-visual-tools         12.8.1               ha770c72_0    conda-forge
cupy-cuda12x              13.5.1                   pypi_0    pypi
cxx-compiler              1.11.0               hfcd1e18_0    conda-forge
dataproperty              1.1.0                    pypi_0    pypi
datasets                  4.0.0                    pypi_0    pypi
dbus                      1.16.2               h3c4dab8_0    conda-forge
decorator                 5.2.1                    pypi_0    pypi
deepspeed                 0.17.2                   pypi_0    pypi
deepspeed-kernels         0.0.1.dev1698255861          pypi_0    pypi
depyf                     0.19.0                   pypi_0    pypi
dill                      0.3.8                    pypi_0    pypi
diskcache                 5.6.3                    pypi_0    pypi
distro                    1.9.0                    pypi_0    pypi
dnspython                 2.7.0                    pypi_0    pypi
einops                    0.8.1                    pypi_0    pypi
email-validator           2.2.0                    pypi_0    pypi
evaluate                  0.4.1                    pypi_0    pypi
fastapi                   0.116.1                  pypi_0    pypi
fastapi-cli               0.0.8                    pypi_0    pypi
fastapi-cloud-cli         0.1.5                    pypi_0    pypi
fastcore                  1.8.7                    pypi_0    pypi
fastrlock                 0.8.3                    pypi_0    pypi
ffmpy                     0.6.1                    pypi_0    pypi
filelock                  3.18.0                   pypi_0    pypi
fire                      0.7.0                    pypi_0    pypi
flash-attn                2.8.2                    pypi_0    pypi
flashinfer-python         0.2.8                    pypi_0    pypi
font-ttf-dejavu-sans-mono 2.37                 hab24e00_0    conda-forge
font-ttf-inconsolata      3.000                h77eed37_0    conda-forge
font-ttf-source-code-pro  2.038                h77eed37_0    conda-forge
font-ttf-ubuntu           0.83                 h77eed37_3    conda-forge
fontconfig                2.15.0               h7e30c49_1    conda-forge
fonts-conda-ecosystem     1                             0    conda-forge
fonts-conda-forge         1                             0    conda-forge
freetype                  2.13.3               ha770c72_1    conda-forge
frozenlist                1.7.0                    pypi_0    pypi
fsspec                    2025.3.0                 pypi_0    pypi
galore-torch              1.0                      pypi_0    pypi
gcc                       14.3.0               h76bdaa0_4    conda-forge
gcc_impl_linux-64         14.3.0               hd9e9e21_4    conda-forge
gcc_linux-64              14.3.0              h1382650_11    conda-forge
gcsfs                     2025.3.0                 pypi_0    pypi
gds-tools                 1.13.1.3             h5888daf_1    conda-forge
gguf                      0.17.1                   pypi_0    pypi
gitdb                     4.0.12                   pypi_0    pypi
gitpython                 3.1.45                   pypi_0    pypi
gmp                       6.3.0                hac33072_2    conda-forge
google-api-core           2.25.1                   pypi_0    pypi
google-auth               2.40.3                   pypi_0    pypi
google-auth-oauthlib      1.2.2                    pypi_0    pypi
google-cloud-core         2.4.3                    pypi_0    pypi
google-cloud-storage      3.2.0                    pypi_0    pypi
google-crc32c             1.7.1                    pypi_0    pypi
google-resumable-media    2.7.2                    pypi_0    pypi
googleapis-common-protos  1.70.0                   pypi_0    pypi
gradio                    5.23.3                   pypi_0    pypi
gradio-client             1.8.0                    pypi_0    pypi
groovy                    0.1.2                    pypi_0    pypi
grpcio                    1.74.0                   pypi_0    pypi
grpclib                   0.4.7                    pypi_0    pypi
gxx                       14.3.0               he448592_4    conda-forge
gxx_impl_linux-64         14.3.0               he663afc_4    conda-forge
gxx_linux-64              14.3.0              ha7acb78_11    conda-forge
h11                       0.16.0                   pypi_0    pypi
h2                        4.2.0                    pypi_0    pypi
hf-transfer               0.1.9                    pypi_0    pypi
hf-xet                    1.1.5                    pypi_0    pypi
hjson                     3.1.0                    pypi_0    pypi
hpack                     4.1.0                    pypi_0    pypi
httpcore                  1.0.9                    pypi_0    pypi
httptools                 0.6.4                    pypi_0    pypi
httpx                     0.28.1                   pypi_0    pypi
huggingface-hub           0.34.3                   pypi_0    pypi
humanfriendly             10.0                     pypi_0    pypi
hyperframe                6.1.0                    pypi_0    pypi
icu                       75.1                 he02047a_0    conda-forge
idna                      3.10                     pypi_0    pypi
immutabledict             4.2.0                    pypi_0    pypi
interegular               0.3.3                    pypi_0    pypi
isodate                   0.7.2                    pypi_0    pypi
jinja2                    3.1.6                    pypi_0    pypi
jiter                     0.10.0                   pypi_0    pypi
jmespath                  1.0.1                    pypi_0    pypi
joblib                    1.5.1                    pypi_0    pypi
jsonlines                 4.0.0                    pypi_0    pypi
jsonschema                4.25.0                   pypi_0    pypi
jsonschema-specifications 2025.4.1                 pypi_0    pypi
kernel-headers_linux-64   5.14.0               he073ed8_2    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
krb5                      1.21.3               h659f571_0    conda-forge
langdetect                1.0.9                    pypi_0    pypi
lark                      1.2.2                    pypi_0    pypi
ld_impl_linux-64          2.44                 h1423503_1    conda-forge
libcap                    2.75                 h39aace5_0    conda-forge
libcublas                 12.8.4.1             h9ab20c4_1    conda-forge
libcublas-dev             12.8.4.1             h9ab20c4_1    conda-forge
libcufft                  11.3.3.83            h5888daf_1    conda-forge
libcufft-dev              11.3.3.83            h5888daf_1    conda-forge
libcufile                 1.13.1.3             h628e99a_1    conda-forge
libcufile-dev             1.13.1.3             h5888daf_1    conda-forge
libcurand                 10.3.9.90            h9ab20c4_1    conda-forge
libcurand-dev             10.3.9.90            h9ab20c4_1    conda-forge
libcusolver               11.7.3.90            h9ab20c4_1    conda-forge
libcusolver-dev           11.7.3.90            h9ab20c4_1    conda-forge
libcusparse               12.5.8.93            h5888daf_1    conda-forge
libcusparse-dev           12.5.8.93            h5888daf_1    conda-forge
libedit                   3.1.20250104    pl5321h7949ede_0    conda-forge
libexpat                  2.7.1                hecca717_0    conda-forge
libffi                    3.4.6                h2dba641_1    conda-forge
libfreetype               2.13.3               ha770c72_1    conda-forge
libfreetype6              2.13.3               h48d6fc4_1    conda-forge
libgcc                    15.1.0               h767d61c_4    conda-forge
libgcc-devel_linux-64     14.3.0             h85bb3a7_104    conda-forge
libgcc-ng                 15.1.0               h69a702a_4    conda-forge
libgcrypt-lib             1.11.1               hb9d3cd8_0    conda-forge
libglib                   2.84.2               h3618099_0    conda-forge
libglvnd                  1.7.0                ha4b6fd6_2    conda-forge
libgomp                   15.1.0               h767d61c_4    conda-forge
libgpg-error              1.55                 h3f2d84a_0    conda-forge
libiconv                  1.18                 h4ce23a2_1    conda-forge
liblzma                   5.8.1                hb9d3cd8_2    conda-forge
libnl                     3.11.0               hb9d3cd8_0    conda-forge
libnpp                    12.3.3.100           h9ab20c4_1    conda-forge
libnpp-dev                12.3.3.100           h9ab20c4_1    conda-forge
libnsl                    2.0.1                hb9d3cd8_1    conda-forge
libnuma                   2.0.18               hb9d3cd8_3    conda-forge
libnvfatbin               12.8.90              h5888daf_1    conda-forge
libnvfatbin-dev           12.8.90              h5888daf_1    conda-forge
libnvjitlink              12.8.93              h5888daf_1    conda-forge
libnvjitlink-dev          12.8.93              h5888daf_1    conda-forge
libnvjpeg                 12.3.5.92            h5888daf_1    conda-forge
libnvjpeg-dev             12.3.5.92            ha770c72_1    conda-forge
libopengl                 1.7.0                ha4b6fd6_2    conda-forge
libpng                    1.6.50               h421ea60_1    conda-forge
libsanitizer              14.3.0               hd08acf3_4    conda-forge
libsqlite                 3.50.4               h0c1763c_0    conda-forge
libstdcxx                 15.1.0               h8f9b012_4    conda-forge
libstdcxx-devel_linux-64  14.3.0             h85bb3a7_104    conda-forge
libstdcxx-ng              15.1.0               h4852527_4    conda-forge
libsystemd0               257.7                h4e0b6ca_0    conda-forge
libudev1                  257.7                hbe16f8c_0    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libxcb                    1.17.0               h8a09558_0    conda-forge
libxcrypt                 4.4.36               hd590300_1    conda-forge
libxkbcommon              1.10.0               h65c71a3_0    conda-forge
libxkbfile                1.1.0                h166bdaf_1    conda-forge
libxml2                   2.13.8               h4bc477f_0    conda-forge
libzlib                   1.3.1                hb9d3cd8_2    conda-forge
liger-kernel              0.6.1                    pypi_0    pypi
llguidance                0.7.30                   pypi_0    pypi
llvmlite                  0.44.0                   pypi_0    pypi
lm-eval                   0.4.7                    pypi_0    pypi
lm-format-enforcer        0.10.11                  pypi_0    pypi
lomo-optim                0.1.1                    pypi_0    pypi
lxml                      6.0.0                    pypi_0    pypi
lz4-c                     1.10.0               h5888daf_1    conda-forge
markdown                  3.8.2                    pypi_0    pypi
markdown-it-py            3.0.0                    pypi_0    pypi
markupsafe                3.0.2                    pypi_0    pypi
mbstrdecoder              1.1.4                    pypi_0    pypi
mdurl                     0.1.2                    pypi_0    pypi
mistral-common            1.8.3                    pypi_0    pypi
modal                     1.0.2                    pypi_0    pypi
more-itertools            10.7.0                   pypi_0    pypi
mpmath                    1.3.0                    pypi_0    pypi
msal                      1.33.0                   pypi_0    pypi
msal-extensions           1.3.1                    pypi_0    pypi
msgpack                   1.1.1                    pypi_0    pypi
msgspec                   0.19.0                   pypi_0    pypi
multidict                 6.6.3                    pypi_0    pypi
multiprocess              0.70.16                  pypi_0    pypi
ncurses                   6.5                  h2d0b736_3    conda-forge
networkx                  3.5                      pypi_0    pypi
ninja                     1.11.1.4                 pypi_0    pypi
nltk                      3.9.1                    pypi_0    pypi
nsight-compute            2025.1.1.2           hb5ebaad_1    conda-forge
nspr                      4.37                 h29cc59b_0    conda-forge
nss                       3.114                hc3c8bcf_0    conda-forge
numba                     0.61.2                   pypi_0    pypi
numexpr                   2.11.0                   pypi_0    pypi
numpy                     2.0.1                    pypi_0    pypi
nvidia-cublas-cu12        12.8.3.14                pypi_0    pypi
nvidia-cuda-cupti-cu12    12.8.57                  pypi_0    pypi
nvidia-cuda-nvrtc-cu12    12.8.61                  pypi_0    pypi
nvidia-cuda-runtime-cu12  12.8.57                  pypi_0    pypi
nvidia-cudnn-cu12         9.7.1.26                 pypi_0    pypi
nvidia-cufft-cu12         11.3.3.41                pypi_0    pypi
nvidia-cufile-cu12        1.13.0.11                pypi_0    pypi
nvidia-curand-cu12        10.3.9.55                pypi_0    pypi
nvidia-cusolver-cu12      11.7.2.55                pypi_0    pypi
nvidia-cusparse-cu12      12.5.7.53                pypi_0    pypi
nvidia-cusparselt-cu12    0.6.3                    pypi_0    pypi
nvidia-ml-py              12.560.30                pypi_0    pypi
nvidia-nccl-cu12          2.26.2                   pypi_0    pypi
nvidia-nvjitlink-cu12     12.8.61                  pypi_0    pypi
nvidia-nvshmem-cu12       3.3.9                    pypi_0    pypi
nvidia-nvtx-cu12          12.8.55                  pypi_0    pypi
oauthlib                  3.3.1                    pypi_0    pypi
oci                       2.157.0                  pypi_0    pypi
ocifs                     1.3.2                    pypi_0    pypi
ocl-icd                   2.3.3                hb9d3cd8_0    conda-forge
openai                    1.90.0                   pypi_0    pypi
opencl-headers            2025.06.13           h5888daf_0    conda-forge
opencv-python-headless    4.12.0.88                pypi_0    pypi
openssl                   3.5.1                h7b32b05_0    conda-forge
optimum                   1.16.2                   pypi_0    pypi
orjson                    3.11.1                   pypi_0    pypi
outlines-core             0.2.10                   pypi_0    pypi
packaging                 23.2                     pypi_0    pypi
pandas                    2.3.1                    pypi_0    pypi
partial-json-parser       0.2.1.1.post6            pypi_0    pypi
pathvalidate              3.3.1                    pypi_0    pypi
pcre2                     10.45                hc749103_0    conda-forge
peft                      0.16.0                   pypi_0    pypi
pillow                    11.3.0                   pypi_0    pypi
pip                       25.2               pyh8b19718_0    conda-forge
platformdirs              4.3.8                    pypi_0    pypi
portalocker               3.2.0                    pypi_0    pypi
prometheus-client         0.22.1                   pypi_0    pypi
prometheus-fastapi-instrumentator 7.1.0                    pypi_0    pypi
propcache                 0.3.2                    pypi_0    pypi
proto-plus                1.26.1                   pypi_0    pypi
protobuf                  6.31.1                   pypi_0    pypi
psutil                    7.0.0                    pypi_0    pypi
pthread-stubs             0.4               hb9d3cd8_1002    conda-forge
py-cpuinfo                9.0.0                    pypi_0    pypi
pyarrow                   21.0.0                   pypi_0    pypi
pyasn1                    0.6.1                    pypi_0    pypi
pyasn1-modules            0.4.2                    pypi_0    pypi
pybase64                  1.4.2                    pypi_0    pypi
pybind11                  3.0.0                    pypi_0    pypi
pycountry                 24.6.1                   pypi_0    pypi
pycparser                 2.22                     pypi_0    pypi
pydantic                  2.10.6                   pypi_0    pypi
pydantic-core             2.27.2                   pypi_0    pypi
pydantic-extra-types      2.10.5                   pypi_0    pypi
pydub                     0.25.1                   pypi_0    pypi
pygments                  2.19.2                   pypi_0    pypi
pyjwt                     2.10.1                   pypi_0    pypi
pynvml                    12.0.0                   pypi_0    pypi
pyopenssl                 24.3.0                   pypi_0    pypi
pytablewriter             1.2.1                    pypi_0    pypi
python                    3.11.13         h9e4cc4f_0_cpython    conda-forge
python-dateutil           2.9.0.post0              pypi_0    pypi
python-dotenv             1.0.1                    pypi_0    pypi
python-json-logger        3.3.0                    pypi_0    pypi
python-multipart          0.0.20                   pypi_0    pypi
pytz                      2025.2                   pypi_0    pypi
pyyaml                    6.0.2                    pypi_0    pypi
pyzmq                     27.0.0                   pypi_0    pypi
ray                       2.48.0                   pypi_0    pypi
rdma-core                 58.0                 h5888daf_0    conda-forge
readline                  8.2                  h8c095d6_2    conda-forge
referencing               0.36.2                   pypi_0    pypi
regex                     2025.7.34                pypi_0    pypi
requests                  2.32.4                   pypi_0    pypi
requests-oauthlib         2.0.0                    pypi_0    pypi
responses                 0.18.0                   pypi_0    pypi
restrictedpython          8.0                      pypi_0    pypi
rich                      14.1.0                   pypi_0    pypi
rich-toolkit              0.14.9                   pypi_0    pypi
rignore                   0.6.4                    pypi_0    pypi
rouge-score               0.1.2                    pypi_0    pypi
rpds-py                   0.26.0                   pypi_0    pypi
rsa                       4.9.1                    pypi_0    pypi
ruff                      0.12.7                   pypi_0    pypi
s3fs                      2025.3.0                 pypi_0    pypi
sacrebleu                 2.5.1                    pypi_0    pypi
safehttpx                 0.1.6                    pypi_0    pypi
safetensors               0.5.3                    pypi_0    pypi
schedulefree              1.4.1                    pypi_0    pypi
scikit-learn              1.4.2                    pypi_0    pypi
scipy                     1.16.1                   pypi_0    pypi
semantic-version          2.10.0                   pypi_0    pypi
sentencepiece             0.2.0                    pypi_0    pypi
sentry-sdk                2.34.1                   pypi_0    pypi
setuptools                80.9.0             pyhff2d567_0    conda-forge
shellingham               1.5.4                    pypi_0    pypi
sigtools                  4.0.1                    pypi_0    pypi
six                       1.17.0                   pypi_0    pypi
smmap                     5.0.2                    pypi_0    pypi
sniffio                   1.3.1                    pypi_0    pypi
soundfile                 0.13.1                   pypi_0    pypi
soxr                      0.5.0.post1              pypi_0    pypi
sqlitedict                2.1.0                    pypi_0    pypi
starlette                 0.47.2                   pypi_0    pypi
sympy                     1.14.0                   pypi_0    pypi
synchronicity             0.9.16                   pypi_0    pypi
sysroot_linux-64          2.34                 h087de78_2    conda-forge
tabledata                 1.3.4                    pypi_0    pypi
tabulate                  0.9.0                    pypi_0    pypi
tcolorpy                  0.1.7                    pypi_0    pypi
tensorboard               2.20.0                   pypi_0    pypi
tensorboard-data-server   0.7.2                    pypi_0    pypi
termcolor                 3.1.0                    pypi_0    pypi
threadpoolctl             3.6.0                    pypi_0    pypi
tiktoken                  0.9.0                    pypi_0    pypi
tk                        8.6.13          noxft_hd72426e_102    conda-forge
tokenizers                0.21.4                   pypi_0    pypi
toml                      0.10.2                   pypi_0    pypi
tomlkit                   0.13.3                   pypi_0    pypi
torch                     2.7.1+cu128              pypi_0    pypi
torch-optimi              0.2.1                    pypi_0    pypi
torchao                   0.12.0                   pypi_0    pypi
torchaudio                2.7.1+cu128              pypi_0    pypi
torchvision               0.22.1+cu128             pypi_0    pypi
tqdm                      4.67.1                   pypi_0    pypi
tqdm-multiprocess         0.0.11                   pypi_0    pypi
transformers              4.54.1                   pypi_0    pypi
triton                    3.3.1                    pypi_0    pypi
trl                       0.21.0.dev0              pypi_0    pypi
typepy                    1.3.4                    pypi_0    pypi
typer                     0.16.0                   pypi_0    pypi
types-certifi             2021.10.8.3              pypi_0    pypi
types-toml                0.10.8.20240310          pypi_0    pypi
typing-extensions         4.14.1                   pypi_0    pypi
tzdata                    2025.2                   pypi_0    pypi
urllib3                   2.5.0                    pypi_0    pypi
uvicorn                   0.35.0                   pypi_0    pypi
uvloop                    0.21.0                   pypi_0    pypi
vllm                      0.10.0                   pypi_0    pypi
wandb                     0.21.0                   pypi_0    pypi
watchfiles                1.1.0                    pypi_0    pypi
wayland                   1.24.0               h3e06ad9_0    conda-forge
websockets                15.0.1                   pypi_0    pypi
werkzeug                  3.1.3                    pypi_0    pypi
wheel                     0.45.1             pyhd8ed1ab_1    conda-forge
word2number               1.1                      pypi_0    pypi
wrapt                     1.17.2                   pypi_0    pypi
xcb-util                  0.4.1                h4f16b4b_2    conda-forge
xcb-util-cursor           0.1.5                hb9d3cd8_0    conda-forge
xcb-util-image            0.4.0                hb711507_2    conda-forge
xcb-util-keysyms          0.4.1                hb711507_0    conda-forge
xcb-util-renderutil       0.3.10               hb711507_0    conda-forge
xcb-util-wm               0.4.2                hb711507_0    conda-forge
xformers                  0.0.31                   pypi_0    pypi
xgrammar                  0.1.21                   pypi_0    pypi
xkeyboard-config          2.45                 hb9d3cd8_0    conda-forge
xorg-libice               1.1.2                hb9d3cd8_0    conda-forge
xorg-libsm                1.2.6                he73a12e_0    conda-forge
xorg-libx11               1.8.12               h4f16b4b_0    conda-forge
xorg-libxau               1.0.12               hb9d3cd8_0    conda-forge
xorg-libxcomposite        0.4.6                hb9d3cd8_2    conda-forge
xorg-libxdamage           1.1.6                hb9d3cd8_0    conda-forge
xorg-libxdmcp             1.1.5                hb9d3cd8_0    conda-forge
xorg-libxext              1.3.6                hb9d3cd8_0    conda-forge
xorg-libxfixes            6.0.1                hb9d3cd8_0    conda-forge
xorg-libxi                1.8.2                hb9d3cd8_0    conda-forge
xorg-libxrandr            1.5.4                hb9d3cd8_0    conda-forge
xorg-libxrender           0.9.12               hb9d3cd8_0    conda-forge
xorg-libxtst              1.2.5                hb9d3cd8_3    conda-forge
xxhash                    3.5.0                    pypi_0    pypi
yarl                      1.20.1                   pypi_0    pypi
zstandard                 0.22.0                   pypi_0    pypi
zstd                      1.5.7                hb8e6e7a_2    conda-forge

alex-ht avatar Aug 08 '25 00:08 alex-ht

This gives me a CUDA oom error (although 180) : ```bash [rank0]: torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 272.00 MiB. GPU 0 has a total capacity of 178.36 GiB of which 258.56 MiB is free. Including non-PyTorch memory, this process has 178.10 GiB memory in use. Of the allocated memory 177.13 GiB is allocated by PyTorch, with 39.50 MiB allocated in private pools (e.g., CUDA Graphs), and 30.65 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Setting expandable_segments=True may break vLLM. I use PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:1024 instead.

I found that you allocated too much VRAM for vLLM. You could try setting gpu_memory_utilization: 0.2.

vllm:
    gpu_memory_utilization: 0.2

alex-ht avatar Aug 08 '25 01:08 alex-ht

WARNING 08-01 08:08:38 [topk_topp_sampler.py:59] FlashInfer is not available. Falling back to the PyTorch-native implementation of top-p & top-k sampling. For the best performance, please install FlashInfer. WARNING 08-01 08:08:38 [cuda.py:280] FlashInfer failed to import for V1 engine on Blackwell (SM 10.0) GPUs; it is recommended to install FlashInfer for better performance.

I recommend installing FlashInfer. Just run pip install flashinfer-python; it compiles the kernel on first use.

alex-ht avatar Aug 08 '25 01:08 alex-ht

chat_template: qwen_25
datasets:
  - path: dataset/grpo-dataset-sample-v4-no-cot.json
    ds_type: json
    split: train
    type: chat_template
    field_messages: prompt

Maybe you could try removing these two lines and see if it helps. type: chat_template and field_messages: prompt

alex-ht avatar Aug 27 '25 07:08 alex-ht