torchtitan icon indicating copy to clipboard operation
torchtitan copied to clipboard

[Possible PR discuss] Will a PR of training HF model be welcomed?

Open junjzhang opened this issue 10 months ago • 7 comments

Hi! We are in the process of developing a novel training framework for Reinforcement Learning (RL) following TorchTitan. Recently, we've developed a feature to support direct training from Hugging Face (HF) models and the loading safetensors in online sharded fashion. This may substantially cuts down the cost of adapting a new model. All you have to do is implement the parallelism applying function. Given this, I wonder whether a PR with the relevant code and a training example for training Hugging Face's Llama model is welcomed. I think this addition will be of great benefit to many in the community. By the way, during my testing, I found that the HF Llama model demonstrates competitive TPS when compared to the model implemented in TorchTitan.

junjzhang avatar Feb 28 '25 03:02 junjzhang

Hi @junjzhang - I can only speak my opinion, but generically anything that helps Titan enable RL type training would be of significant interest. We are also opening up a new "experimental" folder with the idea of enabling more contributions to have a home as well ... so that's another angle that may help your PR to land. The first PR landing in there currently also uses HF aspects for reference (see https://github.com/pytorch/torchtitan/blob/main/torchtitan/experiments/deepseek_v3/attn_mask_utils.py).

Thus, while I don't think anyone can say an unseen PR will 100% be accepted, I can say it would definitely be of interest, and I think it would be worth the effort to post the PR so it can be reviewed/discussed/considered for inclusion. Thanks very much for opening up the discussion!
Maybe @tianyu-l can weigh in here as well.

lessw2020 avatar Feb 28 '25 06:02 lessw2020

Hi @junjzhang - I can only speak my opinion, but generically anything that helps Titan enable RL type training would be of significant interest. We are also opening up a new "experimental" folder with the idea of enabling more contributions to have a home as well ... so that's another angle that may help your PR to land. The first PR landing in there currently also uses HF aspects for reference (see https://github.com/pytorch/torchtitan/blob/main/torchtitan/experiments/deepseek_v3/attn_mask_utils.py).

Thus, while I don't think anyone can say an unseen PR will 100% be accepted, I can say it would definitely be of interest, and I think it would be worth the effort to post the PR so it can be reviewed/discussed/considered for inclusion. Thanks very much for opening up the discussion! Maybe @tianyu-l can weigh in here as well.

Thanks for replying! I thought I could clean up my code and make a draft pr to experiments dir first!

junjzhang avatar Feb 28 '25 07:02 junjzhang

Hey @junjzhang thanks for proposing! We agree this feature is good to have.

As @lessw2020 suggested, let's create new folder hosting HF training under the experiments folder:

  1. load HF model weights
  2. showcase an example of training by "implement the parallelism applying function", and reusing TrainSpec
  3. support converting weights back to HF formats

Relevant discussions:

  • https://github.com/pytorch/torchtitan/issues/420
  • https://github.com/pytorch/torchtitan/issues/743
  • https://github.com/pytorch/torchtitan/issues/824

Maybe we can work with other people who've shown interests and made offline progresses, on this project. cc: @yzhangcs @neeldani @huyiwen @bkchang

tianyu-l avatar Mar 02 '25 21:03 tianyu-l

Hey @junjzhang thanks for proposing! We agree this feature is good to have.

As @lessw2020 suggested, let's create new folder hosting HF training under the experiments folder:

  1. load HF model weights
  2. showcase an example of training by "implement the parallelism applying function", and reusing TrainSpec
  3. support converting weights back to HF formats

Relevant discussions:

Maybe we can work with other people who've shown interests and made offline progresses, on this project. cc: @yzhangcs @neeldani @huyiwen @bkchang

I've finished features 1 and 2. And I think you can easily implement feature 3 by reusing PretrainedModel's save_model weights. I'll try to clean up the relative codes and pull a PR this week. BTW, this feature will introduce extra requirements like transformers. How would you expect this to be handled in the experiment dir?

junjzhang avatar Mar 03 '25 02:03 junjzhang

@lessw2020 @tianyu-l Could you review this PR https://github.com/pytorch/torchtitan/pull/919 ?

junjzhang avatar Mar 03 '25 14:03 junjzhang

Hi @junjzhang - yes, just saw it - thanks for the PR will take a look today!

lessw2020 avatar Mar 03 '25 17:03 lessw2020

Thanks for the PR. I left some comments.

tianyu-l avatar Mar 04 '25 08:03 tianyu-l