Junjie ZHANG
Junjie ZHANG
> Is there a plan to deduplicate the code from the main TorciTitan? What's the motivation of duplicating `main.py` or `train()`? Is it because of `state_dict` loading? If so, we...
> @junjzhang Are you still interested in rebasing the PR and resolving feedbacks to get it merged? > > We've done some refactor to `train.py`, and https://github.com/pytorch/torchtitan/pull/1238 is doing a...
> Interested in this support, @junjzhang are you still working on this PR? Encountered several issues in scaled production env, may clean my codes and pr later.
> Hi [@junjzhang](https://github.com/junjzhang) - I can only speak my opinion, but generically anything that helps Titan enable RL type training would be of significant interest. We are also opening up...
> Hey [@junjzhang](https://github.com/junjzhang) thanks for proposing! We agree this feature is good to have. > > As [@lessw2020](https://github.com/lessw2020) suggested, let's create new folder hosting HF training under the `experiments` folder:...
@lessw2020 @tianyu-l Could you review this PR https://github.com/pytorch/torchtitan/pull/919 ?
https://github.com/deepseek-ai/DeepEP/pull/456 A tested PR, could be further discussed.