Yi-Fu Wu
Yi-Fu Wu
# What does this PR do ? **Add a one line overview of what this PR aims to accomplish.** # Issues List issues that this PR closes ([syntax](https://docs.github.com/en/issues/tracking-your-work-with-issues/using-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword)): # Usage...
# What does this PR do ? Updates vllm to 0.11.2, torch to 2.9, transformers to 4.57.1. Also updates Automodel to use `main` branch. # Issues List issues that this...
# What does this PR do ? **Add a one line overview of what this PR aims to accomplish.** # Issues List issues that this PR closes ([syntax](https://docs.github.com/en/issues/tracking-your-work-with-issues/using-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword)): # Usage...
**Describe the bug** We have some jobs in our nightly tests (e.g. `vlm_grpo-qwen2.5-vl-3b-instruct-clevr-1n2g-dtensor2tp1.v1`) that get killed by the Idle Job Reaper because we always request 8 gpus even if we...
# What does this PR do ? **Add a one line overview of what this PR aims to accomplish.** # Issues List issues that this PR closes ([syntax](https://docs.github.com/en/issues/tracking-your-work-with-issues/using-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword)): # Usage...
**Describe the bug** qwen2.5-vl using megatron path is crashing after megatron-bridge rebase to `main` branch. ``` File "/opt/nemo-rl/3rdparty/Megatron-Bridge-workspace/Megatron-Bridge/src/megatron/bridge/models/qwen_vl/modeling_qwen25_vl.py", line 193, in forward position_ids, rope_deltas = self.get_rope_index( ^^^^^^^^^^^^^^^^^^^^ File "/opt/ray_venvs/nemo_rl.models.policy.megatron_policy_worker.MegatronPolicyWorker/lib/python3.12/site-packages/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py", line...
# What does this PR do ? Previously, we were skipping the Float16Module when defer_fp32_logits=True. This PR changes the logic so that we still use the Float16Module when defer_fp32_logits=True, but...
**Is your feature request related to a problem? Please describe.** Since checkpoints are often converted to HF format for evaluation, add a config to optionally save HF format checkpoints in...