torchtune
torchtune copied to clipboard
Switch remainder of recipe tests over to HF format checkpoints
Context
- [x] update tests by switching to hf format
- addresses #2816
- currently a draft (need to finish 4 more
Changelog
- Switched the formatting of the following recipe tests to hf format similar to issue #2815:
- test_knowledge_distillation_distributed.py
- test_knowledge_distillation_single_device.py
- test_lora_dpo_single_device.py
- test_lora_finetune_distributed.py
- test_lora_finetune_single_device.py
- test_qat_distributed.py
- test_qat_lora_finetune_distributed.py
- test_qat_single_device.py utils.py
- Added support for a lora/dora/qlora config for "llama3_hf_138m" in utils.py
Remaining recipes:
- test_eleuther_eval.py
- test_full_dpo_distributed.py
- test_ppo_full_finetune_single_device.py
- test_dpo_distributed.py
TODO
- [ ] complete remaining 4 recipe tests
- [ ] run unit tests via
pytest tests - [ ] run recipe tests via
pytest tests -m integration_test
Questions/Notes
- I noticed in the fully-fine-tune tests, the tolerance changed from rtol=1e-4 to rtol=1e-3, I have not changed the rtol values in any of the recipes I changed, is this okay or should the tolerance threshold increase for the others too?
- I ran a few tests on my machine and got quite a few OOMs. Not sure if I will be able to change the expected loss so they are currently either commented out or marked with a "# TODO". I may need some help running the tests in case I'm unable to, however I could work around this by lowering the vocab size to fix this if that is okay?
- I committed an unfinished version of test_dpo_distributed on accident.
@krammnic Please let me know if you notice any bugs or errors or have any pointers for me. I should have the remaining recipes done within the next couple days, sorry again the long wait! life's been pretty crazy haha.
:link: Helpful Links
:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2871
- :page_facing_up: Preview Python docs built from this PR
Note: Links to docs will display an error until the docs builds have been completed.
This comment was automatically generated by Dr. CI and updates every 15 minutes.
Completed
- all but 1 recipe
New TODO
- [ ] switch test_lora_dpo_distributed to hf format
- [ ] run tests and update expected losses (may need some help to run some tests in case of OOMs!)
Hey! Thanks for the PR. I will review tomorrow