torchtune
torchtune copied to clipboard
GRPO LoRA Single Device
Context
What is the purpose of this PR? Is it to
- [x ] add a new feature
- [ ] fix a bug
- [ ] update tests and/or documentation
- [ ] other (please add here)
#2421 - exploring a LoRA recipe.
Changelog
What are the changes made in this PR?
- Add in new configs and recipe for lora single device
- SFT config for lora
Following the pattern in the original GRPO PR I tried SFT then GRPO. The model reaches around 45% on the train set after 2 epochs.
By reducing the number of generations I got the recipe running in a single 24GB card, which is a good baseline for ease of access!
Test plan
Please make sure to do each of the following if applicable to your PR. If you're unsure about any one of these just ask and we will happily help. We also have a contributing page for some guidance on contributing.
- [x ] run pre-commit hooks and linters (make sure you've first installed via
pre-commit install) - [ ] add unit tests for any new functionality
- [ ] update docstrings for any new or updated methods or classes
- [x] run unit tests via
pytest tests - [ ] run recipe tests via
pytest tests -m integration_test - [x ] manually run any new or modified recipes with sufficient proof of correctness
- [ ] include relevant commands and any other artifacts in this summary (pastes of loss curves, eval results, etc.)
UX
If your function changed a public API, please add a dummy example of what the user experience will look like when calling it. Here is a docstring example and a tutorial example
- [x ] I did not change any public API
- [ ] I have added an example to docs or docstrings
:link: Helpful Links
:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2467
- :page_facing_up: Preview Python docs built from this PR
Note: Links to docs will display an error until the docs builds have been completed.
:white_check_mark: No Failures
As of commit 50731aaae7a6d205544f7ff3c31370b788cbc969 with merge base 86f148b46072de65c4889b00a2a6e2ae2475b76e ():
:green_heart: Looks good so far! There are no failures yet. :green_heart:
This comment was automatically generated by Dr. CI and updates every 15 minutes.
Updated this - I validated the recipe for single device, so figured it was just cleaner to do a single device diff. I reverted the checkpointing changes as the LoRA from SFT gets merged in and its cleaner just start it again. The recipe gets to a decent success rate during training - testing on 4 epochs running the gms8k task from lm evals gives:
|Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.5436|± |0.0137|
| | |strict-match | 5|exact_match|↑ |0.5421|± |0.0137|
This is (clearly) eval-on-the-training data, but for reference the base llama 3B gets 0.2646 on flexible (and for reference after SFT 0.29, one epoch of GRPO gets to .48). I was a little dubious whether LoRA would work here and it appears it does!
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 66.42%. Comparing base (
1241231) to head (ecec2aa). Report is 2 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #2467 +/- ##
==========================================
- Coverage 66.98% 66.42% -0.57%
==========================================
Files 378 373 -5
Lines 22527 21986 -541
==========================================
- Hits 15090 14604 -486
+ Misses 7437 7382 -55
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
I just saw this PR has been hanging here since March! @ianbarber is this recipe not ready for review? Or did we just miss it?
I think its ready for review, we chatted about it a bit at the time - with the other changes going on I wasn't sure if you wanted it as is or not though! Happy to land it / fix up any other feedback!
Fixed outstanding lints