ray icon indicating copy to clipboard operation
ray copied to clipboard

[Train] Add example of fine-tuning Llama-2 on Intel Gaudi

Open harborn opened this issue 10 months ago • 4 comments

Why are these changes needed?

To leverage the potential of Intel Gaudi accelerator, we extend Ray Train's capabilities by adding support for Intel Gaudi (HPU) hardware. This PR include an example for fine-tuning Llama-2-7b on multi HPUs.

Related issue number

Checks

  • [x] I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • [ ] I've run scripts/format.sh to lint the changes in this PR.
  • [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
    • [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in doc/source/tune/api/ under the corresponding .rst file.
  • [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • [ ] Unit tests
    • [ ] Release tests
    • [ ] This PR is not tested :(

harborn avatar Apr 11 '24 08:04 harborn

@justinvyu Can you take a look and merge it?

woshiyyya avatar May 14 '24 22:05 woshiyyya

The examples look good! Just a few requests:

  1. Clear the cell outputs, and just put a mock markdown cell with the important output info. For example, just this information:
train_result = TrainOutput(global_step=62, training_loss=1.500297857869056, metrics={'train_runtime': 93.3311, 'train_samples_per_second': 71.042, 'train_steps_per_second': 2.222, 'total_flos': 4.02963202792489e+16, 'train_loss': 1.500297857869056, 'epoch': 2.0, 'memory_allocated (GB)': 34.51, 'max_memory_allocated (GB)': 78.72, 'total_memory_available (GB)': 94.62})
  1. Is it possible to merge these two notebooks so that I can just flip a flag if I want to use deepspeed? Most of the logic is identical, just some extra configs.
  2. (Just a question, not blocking) Should we also allow full parameter finetuning instead of always using lora?

Hi, Updated according your comments:

  1. removed unnecessary cell outputs, only keep the important final outputs
  2. merge two notebooks into one, and the final one notebook can run different training method on different execution mode on HPU.
  3. Yes, if not using LoRA training, just remove LoRA conversion when loading pre-trained model.

harborn avatar May 20 '24 09:05 harborn

Seems that @harborn addressed the comments. @justinvyu could you take a look again?

woshiyyya avatar May 22 '24 21:05 woshiyyya

Seems that @harborn addressed the comments. @justinvyu could you take a look again?

@justinvyu please take a look again! Thanks.

harborn avatar May 24 '24 02:05 harborn

@harborn I think you need to add back the orphan: True metadata: https://github.com/ray-project/ray/pull/44667#discussion_r1600456197

justinvyu avatar May 28 '24 18:05 justinvyu

@harborn I think you need to add back the orphan: True metadata: #44667 (comment)

Fixed.

harborn avatar May 29 '24 01:05 harborn