ray
ray copied to clipboard
[Train] Add example of fine-tuning Llama-2 on Intel Gaudi
Why are these changes needed?
To leverage the potential of Intel Gaudi accelerator, we extend Ray Train's capabilities by adding support for Intel Gaudi (HPU) hardware. This PR include an example for fine-tuning Llama-2-7b on multi HPUs.
Related issue number
Checks
- [x] I've signed off every commit(by using the -s flag, i.e.,
git commit -s
) in this PR. - [ ] I've run
scripts/format.sh
to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I added a
method in Tune, I've added it in
doc/source/tune/api/
under the corresponding.rst
file.
- [ ] I've added any new APIs to the API Reference. For example, if I added a
method in Tune, I've added it in
- [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
- [ ] Unit tests
- [ ] Release tests
- [ ] This PR is not tested :(
@justinvyu Can you take a look and merge it?
The examples look good! Just a few requests:
- Clear the cell outputs, and just put a mock markdown cell with the important output info. For example, just this information:
train_result = TrainOutput(global_step=62, training_loss=1.500297857869056, metrics={'train_runtime': 93.3311, 'train_samples_per_second': 71.042, 'train_steps_per_second': 2.222, 'total_flos': 4.02963202792489e+16, 'train_loss': 1.500297857869056, 'epoch': 2.0, 'memory_allocated (GB)': 34.51, 'max_memory_allocated (GB)': 78.72, 'total_memory_available (GB)': 94.62})
- Is it possible to merge these two notebooks so that I can just flip a flag if I want to use deepspeed? Most of the logic is identical, just some extra configs.
- (Just a question, not blocking) Should we also allow full parameter finetuning instead of always using lora?
Hi, Updated according your comments:
- removed unnecessary cell outputs, only keep the important final outputs
- merge two notebooks into one, and the final one notebook can run different training method on different execution mode on HPU.
- Yes, if not using LoRA training, just remove LoRA conversion when loading pre-trained model.
Seems that @harborn addressed the comments. @justinvyu could you take a look again?
Seems that @harborn addressed the comments. @justinvyu could you take a look again?
@justinvyu please take a look again! Thanks.
@harborn I think you need to add back the orphan: True
metadata: https://github.com/ray-project/ray/pull/44667#discussion_r1600456197