torchtune GRPO LoRA Single Device

trafficstars

Context

What is the purpose of this PR? Is it to

[x ] add a new feature
[ ] fix a bug
[ ] update tests and/or documentation
[ ] other (please add here)

#2421 - exploring a LoRA recipe.

Changelog

What are the changes made in this PR?

Add in new configs and recipe for lora single device
SFT config for lora

Following the pattern in the original GRPO PR I tried SFT then GRPO. The model reaches around 45% on the train set after 2 epochs.

By reducing the number of generations I got the recipe running in a single 24GB card, which is a good baseline for ease of access!

Test plan

Please make sure to do each of the following if applicable to your PR. If you're unsure about any one of these just ask and we will happily help. We also have a contributing page for some guidance on contributing.

[x ] run pre-commit hooks and linters (make sure you've first installed via pre-commit install)
[ ] add unit tests for any new functionality
[ ] update docstrings for any new or updated methods or classes
[x] run unit tests via pytest tests
[ ] run recipe tests via pytest tests -m integration_test
[x ] manually run any new or modified recipes with sufficient proof of correctness
[ ] include relevant commands and any other artifacts in this summary (pastes of loss curves, eval results, etc.)

UX

If your function changed a public API, please add a dummy example of what the user experience will look like when calling it. Here is a docstring example and a tutorial example

[x ] I did not change any public API
[ ] I have added an example to docs or docstrings

Mar 09 '25 00:03 ianbarber

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2467

:page_facing_up: Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

:white_check_mark: No Failures

As of commit 50731aaae7a6d205544f7ff3c31370b788cbc969 with merge base 86f148b46072de65c4889b00a2a6e2ae2475b76e (): :green_heart: Looks good so far! There are no failures yet. :green_heart:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Mar 09 '25 00:03 pytorch-bot[bot]

Updated this - I validated the recipe for single device, so figured it was just cleaner to do a single device diff. I reverted the checkpointing changes as the LoRA from SFT gets merged in and its cleaner just start it again. The recipe gets to a decent success rate during training - testing on 4 epochs running the gms8k task from lm evals gives:

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.5436|±  |0.0137|
|     |       |strict-match    |     5|exact_match|↑  |0.5421|±  |0.0137|

This is (clearly) eval-on-the-training data, but for reference the base llama 3B gets 0.2646 on flexible (and for reference after SFT 0.29, one epoch of GRPO gets to .48). I was a little dubious whether LoRA would work here and it appears it does!

Mar 22 '25 03:03 ianbarber

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 66.42%. Comparing base (1241231) to head (ecec2aa). Report is 2 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2467      +/-   ##
==========================================
- Coverage   66.98%   66.42%   -0.57%     
==========================================
  Files         378      373       -5     
  Lines       22527    21986     -541     
==========================================
- Hits        15090    14604     -486     
+ Misses       7437     7382      -55

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:

:snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Mar 22 '25 04:03 codecov-commenter

I just saw this PR has been hanging here since March! @ianbarber is this recipe not ready for review? Or did we just miss it?

May 23 '25 16:05 pbontrager

I think its ready for review, we chatted about it a bit at the time - with the other changes going on I wasn't sure if you wanted it as is or not though! Happy to land it / fix up any other feedback!

May 23 '25 16:05 ianbarber

Fixed outstanding lints

May 31 '25 03:05 ianbarber

torchtune torchtune copied to clipboard

GRPO LoRA Single Device

Context

Changelog

Test plan

UX

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2467

:white_check_mark: No Failures

Codecov Report

torchtune
torchtune copied to clipboard