GRPO training calling DPO dataset processing logic
Please check that this issue hasn't been reported before.
- [x] I searched previous Bug Reports didn't find any similar reports.
Expected Behavior
GRPO is different from DPO, This is not a proxy RL task through a preference dataset with SFT, there is no notion of chosen/rejected so if the rl: grpo flag is specified there is no reason axolotl ever calls any module from dpo and raises an error because the chosen field is missing (we don't need it in this context).
If I have a conversation dataset type column (with no assistant response), and the verifiable responses in dedicated columns (the rewards func have theses names are arguments) it should just process and work.
Current behaviour
Hello everyone,
I am new to Axolotl and I try to setup a GRPO pipeline with a dataset and reward func that are already working fine with pure hf TRL or Unsloth.
When I launch the training I get this error :
Mapping RL Dataset (num_proc=32): 0%| | 0/481 [00:02<?, ? examples/s]
multiprocess.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "xxx/axolotl_toolkit/.venv/lib/python3.12/site-packages/multiprocess/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
^^^^^^^^^^^^^^^^^^^
File "xxx/axolotl_toolkit/.venv/lib/python3.12/site-packages/datasets/utils/py_utils.py", line 688, in _write_generator_to_queue
for i, result in enumerate(func(**kwargs)):
File "xxx/axolotl_toolkit/.venv/lib/python3.12/site-packages/datasets/arrow_dataset.py", line 3501, in _map_single
for i, example in iter_outputs(shard_iterable):
File "xxx/axolotl_toolkit/.venv/lib/python3.12/site-packages/datasets/arrow_dataset.py", line 3475, in iter_outputs
yield i, apply_function(example, i, offset=offset)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "xxx/axolotl_toolkit/.venv/lib/python3.12/site-packages/datasets/arrow_dataset.py", line 3398, in apply_function
processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "xxx/axolotl_toolkit/.venv/lib/python3.12/site-packages/axolotl/prompt_strategies/dpo/chat_template.py", line 65, in transform_fn
chosen_raw = sample[field_chosen]
~~~~~~^^^^^^^^^^^^^^
File "xxx/axolotl_toolkit/.venv/lib/python3.12/site-packages/datasets/formatting/formatting.py", line 278, in __getitem__
value = self.data[key]
~~~~~~~~~^^^^^
KeyError: 'chosen'
Steps to reproduce
Copy paste the GRPO config from https://github.com/axolotl-ai-cloud/grpo_code/blob/main/r1_acecode.yaml and try to may it work with a conversation dataset format (with system and user prompts but no assistant response)
Config yaml
rl: grpo
trl:
beta: 0.001
use_vllm: true
vllm_server_host: 0.0.0.0
vllm_server_port: 8000
vllm_server_timeout: 300
reward_funcs:
- reward_functions.autocoder.dynamicity_reward_func
- reward_functions.autocoder.do_execute_reward_func
- reward_functions.autocoder.overlap_similarity_reward_func
- reward_functions.autocoder.difference_similarity_reward_func
num_generations: 8
max_completion_length: 20000
log_completions: false
chat_template: qwen_25
datasets:
- path: dataset/grpo-dataset-sample-v4-no-cot.json
ds_type: json
split: train
type: chat_template
field_messages: prompt
Possible solution
No response
Which Operating Systems are you using?
- [x] Linux
- [ ] macOS
- [ ] Windows
Python Version
3.12
axolotl branch-commit
0.11.0
Acknowledgements
- [x] My issue title is concise, descriptive, and in title casing.
- [x] I have searched the existing issues to make sure this bug has not been reported yet.
- [x] I am using the latest version of axolotl.
- [x] I have provided enough information for the maintainers to reproduce and diagnose the issue.
Hey, thanks for the Issue. One thing I noticed was that, the type: chat_template.
In the linked example, we pointed to a new transform https://github.com/axolotl-ai-cloud/grpo_code/blob/148ea79321f34bbed79b3b55f04c0a7de002665d/grpo_code/transforms.py#L34 , which properly loads the correct transformation.
We currently don't have built-in GRPO transforms. I suspect that, it's auto-defaulting to DPO's implementation https://docs.axolotl.ai/docs/rlhf.html#chat_template.default
Could you make sure you have the appropriate transform?
Hey ! Thank you for the quick reply, it took me a little bit of time to adjust my env etc. Following your recommandation I created the following transform (which purposely does nothing) :
def grpo_transform(cfg, *args, **kwargs):
def transform_fn(example, tokenizer=None):
return example
return transform_fn
And It seems to solve the problem as the DPO processing functions aren't called anymore.
Now with the same config I have an error on the axolotl vllm-serve command.
Process Process-1:
Traceback (most recent call last):
File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "xxx/axolotl_toolkit/.venv/lib/python3.12/site-packages/axolotl/cli/vllm_serve.py", line 100, in llm_worker
llm = LLM(
^^^^
File "xxx/axolotl_toolkit/.venv/lib/python3.12/site-packages/vllm/entrypoints/llm.py", line 245, in __init__
engine_args = EngineArgs(
^^^^^^^^^^^
TypeError: EngineArgs.__init__() got an unexpected keyword argument 'enable_reasoning'
As I don't want to train a reasoning model I tried to set enable_reasoning: false in the config but this doesn't work.
Which model is this? Does vllm's EngineArgs support that param?
It's a qwen-coder-2.5-7b, normally the model it is not a reasoning model so I don't understand why this parameter exists and why it is set to true.
Thanks, can you try set the below to None
https://github.com/axolotl-ai-cloud/axolotl/blob/7026cd5e9e053d51aa271c1f57f62950bcdc599f/src/axolotl/cli/vllm_serve.py#L65-L67
or alternatively, just delete this line:
https://github.com/axolotl-ai-cloud/axolotl/blob/7026cd5e9e053d51aa271c1f57f62950bcdc599f/src/axolotl/cli/vllm_serve.py#L81
Ok I tried the first solution you proposed, now the error is gone. Now I am blocked waiting forever for the CUDA graph capture to finish..
$ CUDA_VISIBLE_DEVICES=7 axolotl vllm-serve configs/autocode
r_grpo.yaml
#@@ #@@ @@# @@#
@@ @@ @@ @@ =@@# @@ #@ =@@#.
@@ #@@@@@@@@@ @@ #@#@= @@ #@ .=@@
#@@@@@@@@@@@@@@@@@ =@# @# ##= ## =####=+ @@ =#####+ =#@@###. @@
@@@@@@@@@@/ +@@/ +@@ #@ =@= #@= @@ =@#+ +#@# @@ =@#+ +#@# #@. @@
@@@@@@@@@@ ##@@ ##@@ =@# @# =@# @# @@ @@ @@ @@ #@ #@ @@
@@@@@@@@@@@@@@@@@@@@ #@=+++#@= =@@# @@ @@ @@ @@ #@ #@ @@
=@#=====@@ =@# @# @@ @@ @@ @@ #@ #@ @@
@@@@@@@@@@@@@@@@ @@@@ #@ #@= #@= +@@ #@# =@# @@. =@# =@# #@. @@
=@# @# #@= #@ =#@@@@#= +#@@= +#@@@@#= .##@@+ @@
@@@@ @@@@@@@@@@@@@@@@
[2025-08-01 08:08:26,073] [INFO] [axolotl.cli.config.load_cfg:244] [PID:3348451] [RANK:0] config:
{
"activation_offloading": false,
"axolotl_config_path": "configs/autocoder_grpo.yaml",
"base_model": "/data/checkpoints/Qwen2.5-Coder-7B-AutoCoderV2-SFT-FP16",
"base_model_config": "/data/checkpoints/Qwen2.5-Coder-7B-AutoCoderV2-SFT-FP16",
"batch_size": 64,
"bf16": true,
"capabilities": {
"bf16": true,
"compute_capability": "sm_100",
"fp8": false,
"n_gpu": 1,
"n_node": 1
},
"chat_template": "qwen_25",
"dataloader_num_workers": 2,
"dataloader_pin_memory": true,
"dataloader_prefetch_factor": 32,
"dataset_processes": 224,
"datasets": [
{
"ds_type": "json",
"field_messages": "prompt",
"message_property_mappings": {
"content": "content",
"role": "role"
},
"path": "xxx/grpo-dataset-sample-v4-no-cot.json",
"split": "train",
"trust_remote_code": false,
"type": "grpo_chat_template_transform.grpo_transform"
}
],
"ddp": false,
"device": "cuda:0",
"device_map": "auto",
"env_capabilities": {
"torch_version": "2.7.1"
},
"eval_batch_size": 32,
"eval_causal_lm_metrics": [
"sacrebleu",
"comet",
"ter",
"chrf"
],
"eval_max_new_tokens": 128,
"eval_sample_packing": false,
"eval_table_size": 0,
"evals_per_epoch": 0,
"flash_attention": false,
"fp16": false,
"gc_steps": 1,
"gradient_accumulation_steps": 2,
"gradient_checkpointing": true,
"gradient_checkpointing_kwargs": {
"use_reentrant": false
},
"group_by_length": false,
"learning_rate": 5.3e-06,
"lisa_layers_attribute": "model.layers",
"load_best_model_at_end": false,
"load_in_4bit": false,
"load_in_8bit": false,
"local_rank": 0,
"logging_steps": 1,
"lora_dropout": 0.0,
"loraplus_lr_embedding": 1e-06,
"lr_scheduler": "warmup_stable_decay",
"lr_scheduler_kwargs": {
"min_lr_ratio": 0.1,
"num_cycles": 0.5,
"num_decay_steps": 500,
"num_stable_steps": 1500
},
"max_grad_norm": 1.0,
"max_prompt_len": 512,
"max_steps": 2500,
"mean_resizing_embeddings": false,
"micro_batch_size": 32,
"model_config_type": "qwen2",
"num_epochs": 1.0,
"optimizer": "adamw_torch_fused",
"output_dir": "./model-out",
"pad_to_sequence_len": false,
"pretrain_multipack_attn": true,
"pretrain_multipack_buffer_size": 10000,
"profiler_steps_start": 0,
"qlora_sharded_model_loading": false,
"ray_num_workers": 1,
"resources_per_worker": {
"GPU": 1
},
"rl": "grpo",
"sample_packing": false,
"sample_packing_bin_size": 200,
"sample_packing_group_size": 100000,
"save_only_model": false,
"save_safetensors": true,
"save_steps": 0.5,
"saves_per_epoch": 0,
"sequence_len": 1024,
"sequence_parallel_degree": 1,
"shuffle_merged_datasets": true,
"skip_prepare_dataset": false,
"strict": false,
"tensor_parallel_size": 1,
"tf32": true,
"tiled_mlp_use_original_mlp": true,
"tokenizer_config": "/data/checkpoints/WSD-Qwen2.5-Coder-7B-AutoCoderV2-SFT-FP16",
"torch_compile": true,
"torch_dtype": "torch.bfloat16",
"train_on_inputs": false,
"trl": {
"beta": 0.001,
"log_completions": false,
"mask_truncated_completions": false,
"max_completion_length": 20000,
"num_generations": 8,
"ref_model_mixup_alpha": 0.9,
"ref_model_sync_steps": 64,
"reward_funcs": [
"reward_functions.autocoder.dynamicity_reward_func",
"reward_functions.autocoder.do_execute_reward_func",
"reward_functions.autocoder.overlap_similarity_reward_func",
"reward_functions.autocoder.difference_similarity_reward_func"
],
"scale_rewards": true,
"sync_ref_model": false,
"use_vllm": true,
"vllm_server_host": "0.0.0.0",
"vllm_server_port": 8000,
"vllm_server_timeout": 300
},
"use_ray": false,
"val_set_size": 0.0,
"vllm": {
"data_parallel_size": 1,
"device": "auto",
"dtype": "auto",
"gpu_memory_utilization": 0.85,
"host": "0.0.0.0",
"port": 8000,
"tensor_parallel_size": 1
},
"warmup_steps": 500,
"weight_decay": 0.0,
"world_size": 1
}
WARNING 08-01 08:08:38 [topk_topp_sampler.py:59] FlashInfer is not available. Falling back to the PyTorch-native implementation of top-p & top-k sampling. For the best performance, please install FlashInfer.
WARNING 08-01 08:08:38 [cuda.py:280] FlashInfer failed to import for V1 engine on Blackwell (SM 10.0) GPUs; it is recommended to install FlashInfer for better performance.
Loading safetensors checkpoint shards: 0% Completed | 0/4 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 25% Completed | 1/4 [00:00<00:00, 5.92it/s]
Loading safetensors checkpoint shards: 50% Completed | 2/4 [00:00<00:01, 1.94it/s]
Loading safetensors checkpoint shards: 75% Completed | 3/4 [00:01<00:00, 1.62it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00, 1.42it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00, 1.60it/s]
Capturing CUDA graph shapes: 100%|██████████████████████████████████████████████████████████████████████████████| 67/67 [00:01<00:00, 35.32it/s]
I haven't seen that CUDA graph log before. I'll ask the team.
In meantime, where are you running this? Runpod? Locally?
Locally on a DGX-B200, with cuda 12.8 and latest vllm, if it can help here is the versions I have in my env :
$ uv pip list
Package Version Editable project location
---------------------------------------- ----------------------- -------------------------------------------------------
absl-py 2.3.1
accelerate 1.9.0
addict 2.4.0
adlfs 2024.12.0
aiobotocore 2.23.2
aiofiles 23.2.1
aiohappyeyeballs 2.6.1
aiohttp 3.12.14
aioitertools 0.12.0
aiosignal 1.4.0
airportsdata 20250706
annotated-types 0.7.0
antlr4-python3-runtime 4.13.2
anyio 4.9.0
art 6.5
astor 0.8.1
attrs 25.3.0
autoawq 0.2.7.post3
awscli 1.41.13
axolotl 0.12.0.dev0 /home/lulmer/wsd-ai-ml-training/axolotl_toolkit/axolotl
axolotl-contribs-lgpl 0.0.6
axolotl-contribs-mit 0.0.3
azure-core 1.35.0
azure-datalake-store 0.0.53
azure-identity 1.23.1
azure-storage-blob 12.26.0
bitsandbytes 0.46.0
blake3 1.0.5
botocore 1.39.8
cachetools 5.5.2
cbor2 5.6.5
certifi 2025.7.14
cffi 1.17.1
chardet 5.2.0
charset-normalizer 3.4.2
circuitbreaker 2.1.3
click 8.1.8
cloudpickle 3.1.1
colorama 0.4.6
coloredlogs 15.0.1
compressed-tensors 0.10.2
cryptography 44.0.3
cupy-cuda12x 13.5.1
dataproperty 1.1.0
datasets 4.0.0
decorator 5.2.1
deprecated 1.2.18
depyf 0.19.0
dill 0.3.8
diskcache 5.6.3
distro 1.9.0
dnspython 2.7.0
docutils 0.19
einops 0.8.1
email-validator 2.2.0
evaluate 0.4.1
fastapi 0.116.1
fastapi-cli 0.0.8
fastapi-cloud-cli 0.1.4
fastcore 1.8.6
fastrlock 0.8.3
ffmpy 0.6.1
filelock 3.18.0
fire 0.7.0
frozenlist 1.7.0
fsspec 2025.3.0
gcsfs 2025.3.0
gguf 0.17.1
gitdb 4.0.12
gitpython 3.1.45
google-api-core 2.25.1
google-auth 2.40.3
google-auth-oauthlib 1.2.2
google-cloud-core 2.4.3
google-cloud-storage 3.2.0
google-crc32c 1.7.1
google-resumable-media 2.7.2
googleapis-common-protos 1.70.0
gradio 5.23.3
gradio-client 1.8.0
groovy 0.1.2
grpcio 1.74.0
grpclib 0.4.7
h11 0.16.0
h2 4.2.0
hf-transfer 0.1.9
hf-xet 1.1.2
hpack 4.1.0
httpcore 1.0.9
httptools 0.6.4
httpx 0.28.1
huggingface-hub 0.33.5
humanfriendly 10.0
hyperframe 6.1.0
idna 3.10
immutabledict 4.2.0
importlib-metadata 8.0.0
interegular 0.3.3
isodate 0.7.2
jinja2 3.1.6
jiter 0.10.0
jmespath 1.0.1
joblib 1.5.1
jsonlines 4.0.0
jsonschema 4.25.0
jsonschema-specifications 2025.4.1
langdetect 1.0.9
lark 1.2.2
liger-kernel 0.6.0
llguidance 0.7.30
llvmlite 0.44.0
lm-eval 0.4.7
lm-format-enforcer 0.10.11
lxml 6.0.0
markdown 3.8.2
markdown-it-py 3.0.0
markupsafe 3.0.2
mbstrdecoder 1.1.4
mdurl 0.1.2
mistral-common 1.7.0
modal 1.0.2
more-itertools 10.7.0
mpmath 1.3.0
msal 1.33.0
msal-extensions 1.3.1
msgpack 1.1.1
msgspec 0.19.0
multidict 6.6.3
multiprocess 0.70.16
nest-asyncio 1.6.0
networkx 3.5
ninja 1.11.1.4
nltk 3.9.1
numba 0.61.2
numexpr 2.11.0
numpy 2.0.1
nvidia-cublas-cu12 12.8.3.14
nvidia-cuda-cupti-cu12 12.8.57
nvidia-cuda-nvrtc-cu12 12.8.61
nvidia-cuda-runtime-cu12 12.8.57
nvidia-cudnn-cu12 9.7.1.26
nvidia-cufft-cu12 11.3.3.41
nvidia-cufile-cu12 1.13.0.11
nvidia-curand-cu12 10.3.9.55
nvidia-cusolver-cu12 11.7.2.55
nvidia-cusparse-cu12 12.5.7.53
nvidia-cusparselt-cu12 0.6.3
nvidia-ml-py 12.560.30
nvidia-nccl-cu12 2.26.2
nvidia-nvjitlink-cu12 12.8.61
nvidia-nvtx-cu12 12.8.55
oauthlib 3.3.1
oci 2.156.0
ocifs 1.3.2
openai 1.90.0
opencv-python-headless 4.12.0.88
opentelemetry-api 1.26.0
opentelemetry-exporter-otlp 1.26.0
opentelemetry-exporter-otlp-proto-common 1.26.0
opentelemetry-exporter-otlp-proto-grpc 1.26.0
opentelemetry-exporter-otlp-proto-http 1.26.0
opentelemetry-proto 1.26.0
opentelemetry-sdk 1.26.0
opentelemetry-semantic-conventions 0.47b0
opentelemetry-semantic-conventions-ai 0.4.11
optimum 1.16.2
orjson 3.11.0
outlines 0.1.11
outlines-core 0.2.10
packaging 23.2
pandas 2.3.1
partial-json-parser 0.2.1.1.post6
pathvalidate 3.3.1
peft 0.16.0
pillow 11.3.0
platformdirs 4.3.8
portalocker 3.2.0
prometheus-client 0.22.1
prometheus-fastapi-instrumentator 7.1.0
propcache 0.3.2
proto-plus 1.26.1
protobuf 6.31.1
psutil 7.0.0
py-cpuinfo 9.0.0
pyarrow 21.0.0
pyasn1 0.6.1
pyasn1-modules 0.4.2
pybase64 1.4.1
pybind11 3.0.0
pycountry 24.6.1
pycparser 2.22
pydantic 2.10.6
pydantic-core 2.27.2
pydantic-extra-types 2.10.5
pydub 0.25.1
pygments 2.19.2
pyjwt 2.10.1
pyopenssl 24.3.0
pytablewriter 1.2.1
python-dateutil 2.9.0.post0
python-dotenv 1.0.1
python-json-logger 3.3.0
python-multipart 0.0.20
pytz 2025.2
pyyaml 6.0.2
pyzmq 27.0.0
ray 2.48.0
referencing 0.36.2
regex 2024.11.6
requests 2.32.4
requests-oauthlib 2.0.0
responses 0.18.0
rich 14.1.0
rich-toolkit 0.14.8
rignore 0.6.4
rouge-score 0.1.2
rpds-py 0.26.0
rsa 4.7.2
ruff 0.12.5
s3fs 2025.3.0
s3transfer 0.13.1
sacrebleu 2.5.1
safehttpx 0.1.6
safetensors 0.5.3
schedulefree 1.4.1
scikit-learn 1.4.2
scipy 1.16.0
semantic-version 2.10.0
sentencepiece 0.2.0
sentry-sdk 2.33.2
setproctitle 1.3.6
setuptools 79.0.1
shellingham 1.5.4
sigtools 4.0.1
six 1.17.0
smmap 5.0.2
sniffio 1.3.1
soundfile 0.13.1
soxr 0.5.0.post1
sqlitedict 2.1.0
starlette 0.47.2
sympy 1.13.3
synchronicity 0.9.16
tabledata 1.3.4
tabulate 0.9.0
tcolorpy 0.1.7
tensorboard 2.20.0
tensorboard-data-server 0.7.2
termcolor 3.1.0
threadpoolctl 3.6.0
tiktoken 0.9.0
tokenizers 0.21.2
toml 0.10.2
tomlkit 0.13.3
torch 2.7.1+cu128
torchao 0.12.0
torchaudio 2.7.1+cu128
torchvision 0.22.1+cu128
tqdm 4.67.1
tqdm-multiprocess 0.0.11
transformers 4.53.2
triton 3.3.1
trl 0.19.1
typepy 1.3.4
typer 0.16.0
types-certifi 2021.10.8.3
types-toml 0.10.8.20240310
typing-extensions 4.14.1
typing-inspection 0.4.1
tzdata 2025.2
urllib3 2.5.0
uvicorn 0.35.0
uvloop 0.21.0
vllm 0.10.1.dev59+g396ee9418
wandb 0.21.0
watchfiles 1.1.0
websockets 15.0.1
werkzeug 3.1.3
wheel 0.45.1
word2number 1.1
wrapt 1.17.2
xformers 0.0.29.post3
xgrammar 0.1.21
xxhash 3.5.0
yarl 1.20.1
zipp 3.23.0
zstandard 0.22.0
Just to verify, are you able to run, vllm serve ... to see if it's a vllm issue or axolotl issue?
vllm serve <my_model> works flawlessly
vllm serve <my_model>works flawlessly
Same with CUDA_VISIBLE_DEVICES=7 prepended?
If vllm-serve works, can you just leave that up and run the axolotl train command
Thank you for the help, now I tried to launch the training and I have a weird error at the beginning of the forward pass.
File "/.venv/lib/python3.12/site-packages/trl/trainer/grpo_trainer.py", line 1161, in _generate_and_score_completions
completion_ids = [torch.tensor(ids, device=device) for ids in completion_ids]
^^^^^^^^^^^^^^
UnboundLocalError: cannot access local variable 'completion_ids' where it is not associated with a value
@lulmer did you by any chance explicitly set trl.vllm_mode to one of server or colocate ? looking in TRL, if it's set to None, it ends up never defining completion_ids
trl:
vllm_mode: colocate
let me know if https://github.com/axolotl-ai-cloud/axolotl/pull/2998 helps with the completion_ids issue
Interesting ! So pulled the latest version of axolotl with your PR included and tried to run with :
trl:
vllm_mode: colocate
This gives me a CUDA oom error (although 180) : ```bash [rank0]: torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 272.00 MiB. GPU 0 has a total capacity of 178.36 GiB of which 258.56 MiB is free. Including non-PyTorch memory, this process has 178.10 GiB memory in use. Of the allocated memory 177.13 GiB is allocated by PyTorch, with 39.50 MiB allocated in private pools (e.g., CUDA Graphs), and 30.65 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
I also tried server mode :
```bash
trl:
vllm_mode: server
and got these :
raise Exception(f"Request failed: {response.status_code}, {response.text}")
Exception: Request failed: 404, {"detail":"Not Found"}
On the vllm serve terminal I can see that axolotl tries to call this endpoint INFO: 127.0.0.1:39030 - "GET /get_world_size/ HTTP/1.1" 404 Not Found
dug into the initial issue around the DPO loader and the reason this is happening is that the type is chat_template. If you look at this example, typically the type handler is just formatting the dataset to the chat message format. If you could provide a sample row of data, we can help figure out a way to make the transform function work.
One thing if colocate mode works for you (don't need to spin up a separate vllm server) would be something like this example: https://github.com/axolotl-ai-cloud/axolotl-cookbook/blob/main/grpo/gsm8k.yaml#L12-L17
Thank you guys for helping me ! So as I said I purposely set an empty transform because I have already preprocessed the dataset in an external script. The dataset already contains a "prompt" field that has two entries, here is a simplified example to give you a feel of what its looks like :
prompt = [
{'role':'system', 'content':'You are a coding model tasked to convert a static code to a dynamic code, here is an explainer of the syntax you can use [...]'},
{'role':'user', 'content':' <The Static Code [...]'> }]
It is worth mentioning that I discarded very long examples of code so I never get huge prompts (20k cutoff limit).
I am also wondering if axolotl has been properly tested on Blackwell architectures.
I am also wondering if axolotl has been properly tested on Blackwell architectures.
I have successfully done GRPO training on the B200. I used the main branch of Axolotl with PyTorch 2.7.1 and vLLM 0.10.0.
Here is the result of conda list.
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_gnu conda-forge
absl-py 2.3.1 pypi_0 pypi
accelerate 1.10.0.dev0 pypi_0 pypi
addict 2.4.0 pypi_0 pypi
adlfs 2024.12.0 pypi_0 pypi
aiobotocore 2.23.2 pypi_0 pypi
aiofiles 23.2.1 pypi_0 pypi
aiohappyeyeballs 2.6.1 pypi_0 pypi
aiohttp 3.12.15 pypi_0 pypi
aioitertools 0.12.0 pypi_0 pypi
aiosignal 1.4.0 pypi_0 pypi
alsa-lib 1.2.14 hb9d3cd8_0 conda-forge
annotated-types 0.7.0 pypi_0 pypi
antlr4-python3-runtime 4.13.2 pypi_0 pypi
anyio 4.9.0 pypi_0 pypi
apollo-torch 1.0.3 pypi_0 pypi
art 6.5 pypi_0 pypi
astor 0.8.1 pypi_0 pypi
attr 2.5.1 h166bdaf_1 conda-forge
attrs 25.3.0 pypi_0 pypi
autoawq 0.2.7.post3 pypi_0 pypi
axolotl 0.12.0.dev0 pypi_0 pypi
axolotl-contribs-lgpl 0.0.6 pypi_0 pypi
axolotl-contribs-mit 0.0.3 pypi_0 pypi
azure-core 1.35.0 pypi_0 pypi
azure-datalake-store 0.0.53 pypi_0 pypi
azure-identity 1.23.1 pypi_0 pypi
azure-storage-blob 12.26.0 pypi_0 pypi
binutils 2.44 h4852527_1 conda-forge
binutils_impl_linux-64 2.44 h4bf12b8_1 conda-forge
binutils_linux-64 2.44 h4852527_1 conda-forge
bitsandbytes 0.46.0 pypi_0 pypi
blake3 1.0.5 pypi_0 pypi
botocore 1.39.8 pypi_0 pypi
bzip2 1.0.8 h4bc722e_7 conda-forge
c-compiler 1.11.0 h4d9bdce_0 conda-forge
ca-certificates 2025.7.14 hbd8a1cb_0 conda-forge
cachetools 5.5.2 pypi_0 pypi
came-pytorch 0.1.3 pypi_0 pypi
cbor2 5.6.5 pypi_0 pypi
certifi 2025.7.14 pypi_0 pypi
cffi 1.17.1 pypi_0 pypi
chardet 5.2.0 pypi_0 pypi
charset-normalizer 3.4.2 pypi_0 pypi
circuitbreaker 2.1.3 pypi_0 pypi
click 8.1.8 pypi_0 pypi
cloudpickle 3.1.1 pypi_0 pypi
cmake 4.0.3 pypi_0 pypi
colorama 0.4.6 pypi_0 pypi
coloredlogs 15.0.1 pypi_0 pypi
compressed-tensors 0.10.2 pypi_0 pypi
conda-gcc-specs 14.3.0 hb991d5c_4 conda-forge
cryptography 44.0.3 pypi_0 pypi
cuda-bindings 12.9.0 pypi_0 pypi
cuda-cccl_linux-64 12.8.90 ha770c72_1 conda-forge
cuda-command-line-tools 12.8.1 ha770c72_0 conda-forge
cuda-compiler 12.8.1 hbad6d8a_0 conda-forge
cuda-crt-dev_linux-64 12.8.93 ha770c72_3 conda-forge
cuda-crt-tools 12.8.93 ha770c72_3 conda-forge
cuda-cudart 12.8.90 h5888daf_1 conda-forge
cuda-cudart-dev 12.8.90 h5888daf_1 conda-forge
cuda-cudart-dev_linux-64 12.8.90 h3f2d84a_1 conda-forge
cuda-cudart-static 12.8.90 h5888daf_1 conda-forge
cuda-cudart-static_linux-64 12.8.90 h3f2d84a_1 conda-forge
cuda-cudart_linux-64 12.8.90 h3f2d84a_1 conda-forge
cuda-cuobjdump 12.8.90 hbd13f7d_1 conda-forge
cuda-cupti 12.8.90 h5888daf_1 conda-forge
cuda-cupti-dev 12.8.90 h5888daf_1 conda-forge
cuda-cuxxfilt 12.8.90 hbd13f7d_1 conda-forge
cuda-driver-dev 12.8.90 h5888daf_1 conda-forge
cuda-driver-dev_linux-64 12.8.90 h3f2d84a_1 conda-forge
cuda-gdb 12.8.90 ha677faa_1 conda-forge
cuda-libraries 12.8.1 ha770c72_0 conda-forge
cuda-libraries-dev 12.8.1 ha770c72_0 conda-forge
cuda-nsight 12.8.90 h7938cbb_1 conda-forge
cuda-nvcc 12.8.93 hcdd1206_2 conda-forge
cuda-nvcc-dev_linux-64 12.8.93 he91c749_3 conda-forge
cuda-nvcc-impl 12.8.93 h85509e4_3 conda-forge
cuda-nvcc-tools 12.8.93 he02047a_3 conda-forge
cuda-nvcc_linux-64 12.8.93 he0b4e1d_2 conda-forge
cuda-nvdisasm 12.8.90 hbd13f7d_1 conda-forge
cuda-nvml-dev 12.8.90 hbd13f7d_1 conda-forge
cuda-nvprof 12.8.90 hcf8d014_1 conda-forge
cuda-nvprune 12.8.90 hbd13f7d_1 conda-forge
cuda-nvrtc 12.8.93 h5888daf_1 conda-forge
cuda-nvrtc-dev 12.8.93 h5888daf_1 conda-forge
cuda-nvtx 12.8.90 h5888daf_1 conda-forge
cuda-nvvm-dev_linux-64 12.8.93 ha770c72_3 conda-forge
cuda-nvvm-impl 12.8.93 he02047a_3 conda-forge
cuda-nvvm-tools 12.8.93 he02047a_3 conda-forge
cuda-nvvp 12.8.93 hbd13f7d_1 conda-forge
cuda-opencl 12.8.90 h5888daf_1 conda-forge
cuda-opencl-dev 12.8.90 h5888daf_1 conda-forge
cuda-profiler-api 12.8.90 h7938cbb_1 conda-forge
cuda-python 12.9.0 pypi_0 pypi
cuda-sanitizer-api 12.8.93 hbd13f7d_1 conda-forge
cuda-toolkit 12.8.1 0 nvidia/label/cuda-12.8.1
cuda-tools 12.8.1 ha770c72_0 conda-forge
cuda-version 12.8 h5d125a7_3 conda-forge
cuda-visual-tools 12.8.1 ha770c72_0 conda-forge
cupy-cuda12x 13.5.1 pypi_0 pypi
cxx-compiler 1.11.0 hfcd1e18_0 conda-forge
dataproperty 1.1.0 pypi_0 pypi
datasets 4.0.0 pypi_0 pypi
dbus 1.16.2 h3c4dab8_0 conda-forge
decorator 5.2.1 pypi_0 pypi
deepspeed 0.17.2 pypi_0 pypi
deepspeed-kernels 0.0.1.dev1698255861 pypi_0 pypi
depyf 0.19.0 pypi_0 pypi
dill 0.3.8 pypi_0 pypi
diskcache 5.6.3 pypi_0 pypi
distro 1.9.0 pypi_0 pypi
dnspython 2.7.0 pypi_0 pypi
einops 0.8.1 pypi_0 pypi
email-validator 2.2.0 pypi_0 pypi
evaluate 0.4.1 pypi_0 pypi
fastapi 0.116.1 pypi_0 pypi
fastapi-cli 0.0.8 pypi_0 pypi
fastapi-cloud-cli 0.1.5 pypi_0 pypi
fastcore 1.8.7 pypi_0 pypi
fastrlock 0.8.3 pypi_0 pypi
ffmpy 0.6.1 pypi_0 pypi
filelock 3.18.0 pypi_0 pypi
fire 0.7.0 pypi_0 pypi
flash-attn 2.8.2 pypi_0 pypi
flashinfer-python 0.2.8 pypi_0 pypi
font-ttf-dejavu-sans-mono 2.37 hab24e00_0 conda-forge
font-ttf-inconsolata 3.000 h77eed37_0 conda-forge
font-ttf-source-code-pro 2.038 h77eed37_0 conda-forge
font-ttf-ubuntu 0.83 h77eed37_3 conda-forge
fontconfig 2.15.0 h7e30c49_1 conda-forge
fonts-conda-ecosystem 1 0 conda-forge
fonts-conda-forge 1 0 conda-forge
freetype 2.13.3 ha770c72_1 conda-forge
frozenlist 1.7.0 pypi_0 pypi
fsspec 2025.3.0 pypi_0 pypi
galore-torch 1.0 pypi_0 pypi
gcc 14.3.0 h76bdaa0_4 conda-forge
gcc_impl_linux-64 14.3.0 hd9e9e21_4 conda-forge
gcc_linux-64 14.3.0 h1382650_11 conda-forge
gcsfs 2025.3.0 pypi_0 pypi
gds-tools 1.13.1.3 h5888daf_1 conda-forge
gguf 0.17.1 pypi_0 pypi
gitdb 4.0.12 pypi_0 pypi
gitpython 3.1.45 pypi_0 pypi
gmp 6.3.0 hac33072_2 conda-forge
google-api-core 2.25.1 pypi_0 pypi
google-auth 2.40.3 pypi_0 pypi
google-auth-oauthlib 1.2.2 pypi_0 pypi
google-cloud-core 2.4.3 pypi_0 pypi
google-cloud-storage 3.2.0 pypi_0 pypi
google-crc32c 1.7.1 pypi_0 pypi
google-resumable-media 2.7.2 pypi_0 pypi
googleapis-common-protos 1.70.0 pypi_0 pypi
gradio 5.23.3 pypi_0 pypi
gradio-client 1.8.0 pypi_0 pypi
groovy 0.1.2 pypi_0 pypi
grpcio 1.74.0 pypi_0 pypi
grpclib 0.4.7 pypi_0 pypi
gxx 14.3.0 he448592_4 conda-forge
gxx_impl_linux-64 14.3.0 he663afc_4 conda-forge
gxx_linux-64 14.3.0 ha7acb78_11 conda-forge
h11 0.16.0 pypi_0 pypi
h2 4.2.0 pypi_0 pypi
hf-transfer 0.1.9 pypi_0 pypi
hf-xet 1.1.5 pypi_0 pypi
hjson 3.1.0 pypi_0 pypi
hpack 4.1.0 pypi_0 pypi
httpcore 1.0.9 pypi_0 pypi
httptools 0.6.4 pypi_0 pypi
httpx 0.28.1 pypi_0 pypi
huggingface-hub 0.34.3 pypi_0 pypi
humanfriendly 10.0 pypi_0 pypi
hyperframe 6.1.0 pypi_0 pypi
icu 75.1 he02047a_0 conda-forge
idna 3.10 pypi_0 pypi
immutabledict 4.2.0 pypi_0 pypi
interegular 0.3.3 pypi_0 pypi
isodate 0.7.2 pypi_0 pypi
jinja2 3.1.6 pypi_0 pypi
jiter 0.10.0 pypi_0 pypi
jmespath 1.0.1 pypi_0 pypi
joblib 1.5.1 pypi_0 pypi
jsonlines 4.0.0 pypi_0 pypi
jsonschema 4.25.0 pypi_0 pypi
jsonschema-specifications 2025.4.1 pypi_0 pypi
kernel-headers_linux-64 5.14.0 he073ed8_2 conda-forge
keyutils 1.6.1 h166bdaf_0 conda-forge
krb5 1.21.3 h659f571_0 conda-forge
langdetect 1.0.9 pypi_0 pypi
lark 1.2.2 pypi_0 pypi
ld_impl_linux-64 2.44 h1423503_1 conda-forge
libcap 2.75 h39aace5_0 conda-forge
libcublas 12.8.4.1 h9ab20c4_1 conda-forge
libcublas-dev 12.8.4.1 h9ab20c4_1 conda-forge
libcufft 11.3.3.83 h5888daf_1 conda-forge
libcufft-dev 11.3.3.83 h5888daf_1 conda-forge
libcufile 1.13.1.3 h628e99a_1 conda-forge
libcufile-dev 1.13.1.3 h5888daf_1 conda-forge
libcurand 10.3.9.90 h9ab20c4_1 conda-forge
libcurand-dev 10.3.9.90 h9ab20c4_1 conda-forge
libcusolver 11.7.3.90 h9ab20c4_1 conda-forge
libcusolver-dev 11.7.3.90 h9ab20c4_1 conda-forge
libcusparse 12.5.8.93 h5888daf_1 conda-forge
libcusparse-dev 12.5.8.93 h5888daf_1 conda-forge
libedit 3.1.20250104 pl5321h7949ede_0 conda-forge
libexpat 2.7.1 hecca717_0 conda-forge
libffi 3.4.6 h2dba641_1 conda-forge
libfreetype 2.13.3 ha770c72_1 conda-forge
libfreetype6 2.13.3 h48d6fc4_1 conda-forge
libgcc 15.1.0 h767d61c_4 conda-forge
libgcc-devel_linux-64 14.3.0 h85bb3a7_104 conda-forge
libgcc-ng 15.1.0 h69a702a_4 conda-forge
libgcrypt-lib 1.11.1 hb9d3cd8_0 conda-forge
libglib 2.84.2 h3618099_0 conda-forge
libglvnd 1.7.0 ha4b6fd6_2 conda-forge
libgomp 15.1.0 h767d61c_4 conda-forge
libgpg-error 1.55 h3f2d84a_0 conda-forge
libiconv 1.18 h4ce23a2_1 conda-forge
liblzma 5.8.1 hb9d3cd8_2 conda-forge
libnl 3.11.0 hb9d3cd8_0 conda-forge
libnpp 12.3.3.100 h9ab20c4_1 conda-forge
libnpp-dev 12.3.3.100 h9ab20c4_1 conda-forge
libnsl 2.0.1 hb9d3cd8_1 conda-forge
libnuma 2.0.18 hb9d3cd8_3 conda-forge
libnvfatbin 12.8.90 h5888daf_1 conda-forge
libnvfatbin-dev 12.8.90 h5888daf_1 conda-forge
libnvjitlink 12.8.93 h5888daf_1 conda-forge
libnvjitlink-dev 12.8.93 h5888daf_1 conda-forge
libnvjpeg 12.3.5.92 h5888daf_1 conda-forge
libnvjpeg-dev 12.3.5.92 ha770c72_1 conda-forge
libopengl 1.7.0 ha4b6fd6_2 conda-forge
libpng 1.6.50 h421ea60_1 conda-forge
libsanitizer 14.3.0 hd08acf3_4 conda-forge
libsqlite 3.50.4 h0c1763c_0 conda-forge
libstdcxx 15.1.0 h8f9b012_4 conda-forge
libstdcxx-devel_linux-64 14.3.0 h85bb3a7_104 conda-forge
libstdcxx-ng 15.1.0 h4852527_4 conda-forge
libsystemd0 257.7 h4e0b6ca_0 conda-forge
libudev1 257.7 hbe16f8c_0 conda-forge
libuuid 2.38.1 h0b41bf4_0 conda-forge
libxcb 1.17.0 h8a09558_0 conda-forge
libxcrypt 4.4.36 hd590300_1 conda-forge
libxkbcommon 1.10.0 h65c71a3_0 conda-forge
libxkbfile 1.1.0 h166bdaf_1 conda-forge
libxml2 2.13.8 h4bc477f_0 conda-forge
libzlib 1.3.1 hb9d3cd8_2 conda-forge
liger-kernel 0.6.1 pypi_0 pypi
llguidance 0.7.30 pypi_0 pypi
llvmlite 0.44.0 pypi_0 pypi
lm-eval 0.4.7 pypi_0 pypi
lm-format-enforcer 0.10.11 pypi_0 pypi
lomo-optim 0.1.1 pypi_0 pypi
lxml 6.0.0 pypi_0 pypi
lz4-c 1.10.0 h5888daf_1 conda-forge
markdown 3.8.2 pypi_0 pypi
markdown-it-py 3.0.0 pypi_0 pypi
markupsafe 3.0.2 pypi_0 pypi
mbstrdecoder 1.1.4 pypi_0 pypi
mdurl 0.1.2 pypi_0 pypi
mistral-common 1.8.3 pypi_0 pypi
modal 1.0.2 pypi_0 pypi
more-itertools 10.7.0 pypi_0 pypi
mpmath 1.3.0 pypi_0 pypi
msal 1.33.0 pypi_0 pypi
msal-extensions 1.3.1 pypi_0 pypi
msgpack 1.1.1 pypi_0 pypi
msgspec 0.19.0 pypi_0 pypi
multidict 6.6.3 pypi_0 pypi
multiprocess 0.70.16 pypi_0 pypi
ncurses 6.5 h2d0b736_3 conda-forge
networkx 3.5 pypi_0 pypi
ninja 1.11.1.4 pypi_0 pypi
nltk 3.9.1 pypi_0 pypi
nsight-compute 2025.1.1.2 hb5ebaad_1 conda-forge
nspr 4.37 h29cc59b_0 conda-forge
nss 3.114 hc3c8bcf_0 conda-forge
numba 0.61.2 pypi_0 pypi
numexpr 2.11.0 pypi_0 pypi
numpy 2.0.1 pypi_0 pypi
nvidia-cublas-cu12 12.8.3.14 pypi_0 pypi
nvidia-cuda-cupti-cu12 12.8.57 pypi_0 pypi
nvidia-cuda-nvrtc-cu12 12.8.61 pypi_0 pypi
nvidia-cuda-runtime-cu12 12.8.57 pypi_0 pypi
nvidia-cudnn-cu12 9.7.1.26 pypi_0 pypi
nvidia-cufft-cu12 11.3.3.41 pypi_0 pypi
nvidia-cufile-cu12 1.13.0.11 pypi_0 pypi
nvidia-curand-cu12 10.3.9.55 pypi_0 pypi
nvidia-cusolver-cu12 11.7.2.55 pypi_0 pypi
nvidia-cusparse-cu12 12.5.7.53 pypi_0 pypi
nvidia-cusparselt-cu12 0.6.3 pypi_0 pypi
nvidia-ml-py 12.560.30 pypi_0 pypi
nvidia-nccl-cu12 2.26.2 pypi_0 pypi
nvidia-nvjitlink-cu12 12.8.61 pypi_0 pypi
nvidia-nvshmem-cu12 3.3.9 pypi_0 pypi
nvidia-nvtx-cu12 12.8.55 pypi_0 pypi
oauthlib 3.3.1 pypi_0 pypi
oci 2.157.0 pypi_0 pypi
ocifs 1.3.2 pypi_0 pypi
ocl-icd 2.3.3 hb9d3cd8_0 conda-forge
openai 1.90.0 pypi_0 pypi
opencl-headers 2025.06.13 h5888daf_0 conda-forge
opencv-python-headless 4.12.0.88 pypi_0 pypi
openssl 3.5.1 h7b32b05_0 conda-forge
optimum 1.16.2 pypi_0 pypi
orjson 3.11.1 pypi_0 pypi
outlines-core 0.2.10 pypi_0 pypi
packaging 23.2 pypi_0 pypi
pandas 2.3.1 pypi_0 pypi
partial-json-parser 0.2.1.1.post6 pypi_0 pypi
pathvalidate 3.3.1 pypi_0 pypi
pcre2 10.45 hc749103_0 conda-forge
peft 0.16.0 pypi_0 pypi
pillow 11.3.0 pypi_0 pypi
pip 25.2 pyh8b19718_0 conda-forge
platformdirs 4.3.8 pypi_0 pypi
portalocker 3.2.0 pypi_0 pypi
prometheus-client 0.22.1 pypi_0 pypi
prometheus-fastapi-instrumentator 7.1.0 pypi_0 pypi
propcache 0.3.2 pypi_0 pypi
proto-plus 1.26.1 pypi_0 pypi
protobuf 6.31.1 pypi_0 pypi
psutil 7.0.0 pypi_0 pypi
pthread-stubs 0.4 hb9d3cd8_1002 conda-forge
py-cpuinfo 9.0.0 pypi_0 pypi
pyarrow 21.0.0 pypi_0 pypi
pyasn1 0.6.1 pypi_0 pypi
pyasn1-modules 0.4.2 pypi_0 pypi
pybase64 1.4.2 pypi_0 pypi
pybind11 3.0.0 pypi_0 pypi
pycountry 24.6.1 pypi_0 pypi
pycparser 2.22 pypi_0 pypi
pydantic 2.10.6 pypi_0 pypi
pydantic-core 2.27.2 pypi_0 pypi
pydantic-extra-types 2.10.5 pypi_0 pypi
pydub 0.25.1 pypi_0 pypi
pygments 2.19.2 pypi_0 pypi
pyjwt 2.10.1 pypi_0 pypi
pynvml 12.0.0 pypi_0 pypi
pyopenssl 24.3.0 pypi_0 pypi
pytablewriter 1.2.1 pypi_0 pypi
python 3.11.13 h9e4cc4f_0_cpython conda-forge
python-dateutil 2.9.0.post0 pypi_0 pypi
python-dotenv 1.0.1 pypi_0 pypi
python-json-logger 3.3.0 pypi_0 pypi
python-multipart 0.0.20 pypi_0 pypi
pytz 2025.2 pypi_0 pypi
pyyaml 6.0.2 pypi_0 pypi
pyzmq 27.0.0 pypi_0 pypi
ray 2.48.0 pypi_0 pypi
rdma-core 58.0 h5888daf_0 conda-forge
readline 8.2 h8c095d6_2 conda-forge
referencing 0.36.2 pypi_0 pypi
regex 2025.7.34 pypi_0 pypi
requests 2.32.4 pypi_0 pypi
requests-oauthlib 2.0.0 pypi_0 pypi
responses 0.18.0 pypi_0 pypi
restrictedpython 8.0 pypi_0 pypi
rich 14.1.0 pypi_0 pypi
rich-toolkit 0.14.9 pypi_0 pypi
rignore 0.6.4 pypi_0 pypi
rouge-score 0.1.2 pypi_0 pypi
rpds-py 0.26.0 pypi_0 pypi
rsa 4.9.1 pypi_0 pypi
ruff 0.12.7 pypi_0 pypi
s3fs 2025.3.0 pypi_0 pypi
sacrebleu 2.5.1 pypi_0 pypi
safehttpx 0.1.6 pypi_0 pypi
safetensors 0.5.3 pypi_0 pypi
schedulefree 1.4.1 pypi_0 pypi
scikit-learn 1.4.2 pypi_0 pypi
scipy 1.16.1 pypi_0 pypi
semantic-version 2.10.0 pypi_0 pypi
sentencepiece 0.2.0 pypi_0 pypi
sentry-sdk 2.34.1 pypi_0 pypi
setuptools 80.9.0 pyhff2d567_0 conda-forge
shellingham 1.5.4 pypi_0 pypi
sigtools 4.0.1 pypi_0 pypi
six 1.17.0 pypi_0 pypi
smmap 5.0.2 pypi_0 pypi
sniffio 1.3.1 pypi_0 pypi
soundfile 0.13.1 pypi_0 pypi
soxr 0.5.0.post1 pypi_0 pypi
sqlitedict 2.1.0 pypi_0 pypi
starlette 0.47.2 pypi_0 pypi
sympy 1.14.0 pypi_0 pypi
synchronicity 0.9.16 pypi_0 pypi
sysroot_linux-64 2.34 h087de78_2 conda-forge
tabledata 1.3.4 pypi_0 pypi
tabulate 0.9.0 pypi_0 pypi
tcolorpy 0.1.7 pypi_0 pypi
tensorboard 2.20.0 pypi_0 pypi
tensorboard-data-server 0.7.2 pypi_0 pypi
termcolor 3.1.0 pypi_0 pypi
threadpoolctl 3.6.0 pypi_0 pypi
tiktoken 0.9.0 pypi_0 pypi
tk 8.6.13 noxft_hd72426e_102 conda-forge
tokenizers 0.21.4 pypi_0 pypi
toml 0.10.2 pypi_0 pypi
tomlkit 0.13.3 pypi_0 pypi
torch 2.7.1+cu128 pypi_0 pypi
torch-optimi 0.2.1 pypi_0 pypi
torchao 0.12.0 pypi_0 pypi
torchaudio 2.7.1+cu128 pypi_0 pypi
torchvision 0.22.1+cu128 pypi_0 pypi
tqdm 4.67.1 pypi_0 pypi
tqdm-multiprocess 0.0.11 pypi_0 pypi
transformers 4.54.1 pypi_0 pypi
triton 3.3.1 pypi_0 pypi
trl 0.21.0.dev0 pypi_0 pypi
typepy 1.3.4 pypi_0 pypi
typer 0.16.0 pypi_0 pypi
types-certifi 2021.10.8.3 pypi_0 pypi
types-toml 0.10.8.20240310 pypi_0 pypi
typing-extensions 4.14.1 pypi_0 pypi
tzdata 2025.2 pypi_0 pypi
urllib3 2.5.0 pypi_0 pypi
uvicorn 0.35.0 pypi_0 pypi
uvloop 0.21.0 pypi_0 pypi
vllm 0.10.0 pypi_0 pypi
wandb 0.21.0 pypi_0 pypi
watchfiles 1.1.0 pypi_0 pypi
wayland 1.24.0 h3e06ad9_0 conda-forge
websockets 15.0.1 pypi_0 pypi
werkzeug 3.1.3 pypi_0 pypi
wheel 0.45.1 pyhd8ed1ab_1 conda-forge
word2number 1.1 pypi_0 pypi
wrapt 1.17.2 pypi_0 pypi
xcb-util 0.4.1 h4f16b4b_2 conda-forge
xcb-util-cursor 0.1.5 hb9d3cd8_0 conda-forge
xcb-util-image 0.4.0 hb711507_2 conda-forge
xcb-util-keysyms 0.4.1 hb711507_0 conda-forge
xcb-util-renderutil 0.3.10 hb711507_0 conda-forge
xcb-util-wm 0.4.2 hb711507_0 conda-forge
xformers 0.0.31 pypi_0 pypi
xgrammar 0.1.21 pypi_0 pypi
xkeyboard-config 2.45 hb9d3cd8_0 conda-forge
xorg-libice 1.1.2 hb9d3cd8_0 conda-forge
xorg-libsm 1.2.6 he73a12e_0 conda-forge
xorg-libx11 1.8.12 h4f16b4b_0 conda-forge
xorg-libxau 1.0.12 hb9d3cd8_0 conda-forge
xorg-libxcomposite 0.4.6 hb9d3cd8_2 conda-forge
xorg-libxdamage 1.1.6 hb9d3cd8_0 conda-forge
xorg-libxdmcp 1.1.5 hb9d3cd8_0 conda-forge
xorg-libxext 1.3.6 hb9d3cd8_0 conda-forge
xorg-libxfixes 6.0.1 hb9d3cd8_0 conda-forge
xorg-libxi 1.8.2 hb9d3cd8_0 conda-forge
xorg-libxrandr 1.5.4 hb9d3cd8_0 conda-forge
xorg-libxrender 0.9.12 hb9d3cd8_0 conda-forge
xorg-libxtst 1.2.5 hb9d3cd8_3 conda-forge
xxhash 3.5.0 pypi_0 pypi
yarl 1.20.1 pypi_0 pypi
zstandard 0.22.0 pypi_0 pypi
zstd 1.5.7 hb8e6e7a_2 conda-forge
This gives me a CUDA oom error (although 180) : ```bash [rank0]: torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 272.00 MiB. GPU 0 has a total capacity of 178.36 GiB of which 258.56 MiB is free. Including non-PyTorch memory, this process has 178.10 GiB memory in use. Of the allocated memory 177.13 GiB is allocated by PyTorch, with 39.50 MiB allocated in private pools (e.g., CUDA Graphs), and 30.65 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
Setting expandable_segments=True may break vLLM. I use PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:1024 instead.
I found that you allocated too much VRAM for vLLM. You could try setting gpu_memory_utilization: 0.2.
vllm:
gpu_memory_utilization: 0.2
WARNING 08-01 08:08:38 [topk_topp_sampler.py:59] FlashInfer is not available. Falling back to the PyTorch-native implementation of top-p & top-k sampling. For the best performance, please install FlashInfer. WARNING 08-01 08:08:38 [cuda.py:280] FlashInfer failed to import for V1 engine on Blackwell (SM 10.0) GPUs; it is recommended to install FlashInfer for better performance.
I recommend installing FlashInfer. Just run pip install flashinfer-python; it compiles the kernel on first use.
chat_template: qwen_25
datasets:
- path: dataset/grpo-dataset-sample-v4-no-cot.json
ds_type: json
split: train
type: chat_template
field_messages: prompt
Maybe you could try removing these two lines and see if it helps.
type: chat_template and field_messages: prompt