Junjie Wang

Results 8 issues of Junjie Wang

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #96989 Differential Revision: [D44158327](https://our.internmc.facebook.com/intern/diff/D44158327)

better-engineering
ciflow/trunk
release notes: distributed (sharded)
ciflow/periodic

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #96985 * #96989 Differential Revision: [D44158326](https://our.internmc.facebook.com/intern/diff/D44158326)

better-engineering
ciflow/trunk
release notes: distributed (sharded)
ciflow/periodic

As part of ShardedTensor deprecation, we start the cleanup for its use case in torch snapshot. This is the first PR for a series PR and want to get feedback...

CLA Signed

In this PR, we mostly measured the performance and loss curves for 405B model with some optimizations techniques we recently developed. We also want to log the actual peak TFLOPs...

CLA Signed

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #134528 * #134383 - This PR generates a more useful output log for users: P1552399180. - It also fixes the logic when...

oncall: distributed
topic: not user facing
suppress-bc-linter

This is first step to include more models into torchtitan to demonstrate composability of pretrain. Now with llama 3.2 coming and we already have it available in torch tune. We...

CLA Signed

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #140975 We added `CudaEventCache` in https://github.com/pytorch/pytorch/pull/133727 and this is a feature which tries to reuse CudaEvent so that we don't call destroy...

oncall: distributed
ciflow/trunk
release notes: distributed (c10d)