Ivan Kobzarev
Ivan Kobzarev
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #134021 * #132755 * #132638
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #134681 cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames @rec
## Description Describe your changes in detail. ## Motivation and Context Why is this change required? What problem does it solve? If it fixes an open issue, please link to...
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #2756 * __->__ #2755 Enablement copy pasted from torchtitan. There is only one difference - output of grouped_mm is unintialized after offsets[-1] (result...
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #2756 * #2755
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #2644
optimizer.step() is compilable for most of optimizers in moe.py For cpu_offload case 17B model it helps to make 4 -> 10 tokens per second ``` --- a/torchtune/modules/moe/moe.py 03:49:55 [25/39075] +++...
``` python benchmarks/dynamo/torchbench.py --accuracy --no-translation-validation --inference --bfloat16 --backend inductor --disable-cudagraphs --device cuda --only torchrec_dlrm ``` ``` Traceback (most recent call last): File "/data/users/ivankobzarev/a/pytorch/benchmarks/dynamo/common.py", line 2744, in validate_model self.model_iter_fn(model, example_inputs) File...
Running from pytorch: ``` python benchmarks/dynamo/torchbench.py --only mobilenet_v2_quantized_qat --accuracy --no-translation-validation --training --amp --backend inductor --device cuda --output out.csv ``` Error: ``` cuda train mobilenet_v2_quantized_qat Traceback (most recent call last): File...
``` python benchmarks/dynamo/torchbench.py --only sam --accuracy --no-translation-validation --training --amp --backend inductor --disable-cudagraphs --device cuda ``` ``` cuda train sam Traceback (most recent call last): File "/data/users/ivankobzarev/a/pytorch/benchmarks/dynamo/common.py", line 2744, in validate_model...