Ivan Kobzarev issues

Results 46 issues of


                                            Ivan Kobzarev

WIP separate tokens for backward

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #134021 * #132755 * #132638

Stale

ciflow/inductor

[WIP][tests] TEST_WITH_SUBCLASSES env var

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #134681 cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames @rec

Stale

release notes: releng

module: dynamo

ciflow/inductor

[DEBUG] ppo compile

## Description Describe your changes in detail. ## Motivation and Context Why is this change required? What problem does it solve? If it fixes an open issue, please link to...

CLA Signed

[llama4] use grouped_mm in moe for sm90

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #2756 * __->__ #2755 Enablement copy pasted from torchtitan. There is only one difference - output of grouped_mm is unintialized after offsets[-1] (result...

CLA Signed

[WIP][DEBUG] llama4 debugging

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #2756 * #2755

CLA Signed

WIP-DEBUG-PROFILE torch.compile

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #2644

CLA Signed

Add torch.compile to optimizer.step()

optimizer.step() is compilable for most of optimizers in moe.py For cpu_offload case 17B model it helps to make 4 -> 10 tokens per second ``` --- a/torchtune/modules/moe/moe.py 03:49:55 [25/39075] +++...

enhancement

best practice

[eager_fail_to_run] Fail to run torchrec_dlrm with --bfloat16

``` python benchmarks/dynamo/torchbench.py --accuracy --no-translation-validation --inference --bfloat16 --backend inductor --disable-cudagraphs --device cuda --only torchrec_dlrm ``` ``` Traceback (most recent call last): File "/data/users/ivankobzarev/a/pytorch/benchmarks/dynamo/common.py", line 2744, in validate_model self.model_iter_fn(model, example_inputs) File...

[eager_fail_to_run ] mobilenet_v2_quantized_qat, eager run failed on dtype mismatch

Running from pytorch: ``` python benchmarks/dynamo/torchbench.py --only mobilenet_v2_quantized_qat --accuracy --no-translation-validation --training --amp --backend inductor --device cuda --output out.csv ``` Error: ``` cuda train mobilenet_v2_quantized_qat Traceback (most recent call last): File...

[eager_fail_to_run] cuda train sam

``` python benchmarks/dynamo/torchbench.py --only sam --accuracy --no-translation-validation --training --amp --backend inductor --disable-cudagraphs --device cuda ``` ``` cuda train sam Traceback (most recent call last): File "/data/users/ivankobzarev/a/pytorch/benchmarks/dynamo/common.py", line 2744, in validate_model...