Junjie ZHANG comments

Results 17 comments of


                                            Junjie ZHANG

[Experimental Feature] Huggingface model training

> Is there a plan to deduplicate the code from the main TorciTitan? What's the motivation of duplicating `main.py` or `train()`? Is it because of `state_dict` loading? If so, we...

[Experimental Feature] Huggingface model training

> @junjzhang Are you still interested in rebasing the PR and resolving feedbacks to get it merged? > > We've done some refactor to `train.py`, and https://github.com/pytorch/torchtitan/pull/1238 is doing a...

[Experimental Feature] Huggingface model training

> Interested in this support, @junjzhang are you still working on this PR? Encountered several issues in scaled production env, may clean my codes and pr later.

[Possible PR discuss] Will a PR of training HF model be welcomed?

> Hi [@junjzhang](https://github.com/junjzhang) - I can only speak my opinion, but generically anything that helps Titan enable RL type training would be of significant interest. We are also opening up...

[Possible PR discuss] Will a PR of training HF model be welcomed?

> Hey [@junjzhang](https://github.com/junjzhang) thanks for proposing! We agree this feature is good to have. > > As [@lessw2020](https://github.com/lessw2020) suggested, let's create new folder hosting HF training under the `experiments` folder:...

[Possible PR discuss] Will a PR of training HF model be welcomed?

@lessw2020 @tianyu-l Could you review this PR https://github.com/pytorch/torchtitan/pull/919 ?

Maybe use reference stash to replace record stream to reduce mem peak

https://github.com/deepseek-ai/DeepEP/pull/456 A tested PR, could be further discussed.