Adil
Adil
Inference util to go from NeMo2 cpkt -> MCore model loading for inference using MCore utils. > [!IMPORTANT] > The `Update branch` button must only be pressed in very rare...
# What does this PR do ? Adds GPT-OSS SFT using AutoModel custom models + DeepEP. To run, launch the nightly container and run ``` NRL_FORCE_REBUILD_VENVS=true uv run examples/run_sft.py --config...
# What does this PR do ? Uses Automodel's FSDP2 manager for initializing the v2 worker. Sharding on current main: ``` 2025-11-12 16:03:44 (DTensorPolicyWorkerV2 pid=1247213) ================================================================================ 2025-11-12 16:03:44 (DTensorPolicyWorkerV2 pid=1247213)...
# What does this PR do ? Adds fp16 for policy training https://wandb.ai/nvidia/automodel-rl/workspace?nw=6pzs4djqn28 The wandb above shows BF16 (v1 policy) and FP16 (v1 & v2 policies)
Currently the flag will force the user off of custom models but still apply automodel specific perf opts like liger kernel, diff attention backend, etc. We want to make the...
Example run (here it just compares to itself, but the output would be JSONL from the test run): ``` pytest tests/functional_tests/gt_metrics/test_log_compare.py --ground-truth-jsonl tests/functional_tests/gt_metrics/gpt_oss_20b_te_deepep_train_EP_8.jsonl --compare-jsonl tests/functional_tests/gt_metrics/gpt_oss_20b_te_deepep_train_EP_8.jsonl ```
**Is your feature request related to a problem? Please describe.** A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] **Describe the solution you'd...
**Is your feature request related to a problem? Please describe.** Automodel currently does not support automatically dequantizing weights to BF16. **Describe the solution you'd like** We need to be able...
The .so file that gets created contains the python version in the filename. This all works fine if your system python version matches the python version specified in the project...
**Is your feature request related to a problem? Please describe.** StatefulDataloader is currently not DP-aware. If you change DP size during a checkpoint then you will get inconsistent results because...