Yan Wang

Results 78 comments of Yan Wang

Hard to say if nvfuser support embedding(or another operation) can help save memory, according to my old analysis on the 2-layer case, the operators appearing in the thunder trace and...

Synced with @IvanYashchuk , according the updated benchmark result in 20241107, we'll focus on solving the OOM for Platypus-30B, falcon-40b, vicuna-33b-v1.3 with Thunder and ThunderFX backend by choosing proper scaling...

Had a quick check on 1node(8*H100) `torchrun --nproc_per_node=8 --nnodes=1 /opt/pytorch/lightning-thunder/thunder/benchmarks/benchmark_litgpt.py --model_name Platypus-30B --micro_batch_size 1 --distributed_mode fsdp --shard_mode zero3 --compile thunder --max_iters 20 --warmup_iters 5` went OOM with the factor=0.05, falcon-40b...

Platypus-30B: Number of parameters: 4.07B Gemma-7b: Number of parameters: 1.17B so I think Platypus-30B is bigger. when n_layers=20: factor=0.05 => Saved for backward size: 11596.01 MiB, Saved for backward number...

The initial problem of this issue has been solved, Gemma-7, pythia-12b, Llama-2-13b-hf and Nous-Hermes-13b can run without OOM with Thunder and ThunderFX backend on 2*8H100, the current problem is to...

Modifications of the original resnet50 in torchvision: 1. replace `out += identity` with `out = out + identity` 2. ReLU(replace=False) 3. set `num_batches_tracked = None` for BatchNorm (a workaround since...

> @kiya00 could you try resnet50 w/o modifications? I expect it to work following #633 sure, I'll try the original one

Hi @jjsjann123 , there's an error when run `pytest thunder/benchmarks/targets.py -k test_resnet50[backward-thunder]` that could be related to NumberProxy, nvfuserex_impl.py doesn't seem to be able to handle shape list with mixed...

Another problem is when running ` pytest thunder/benchmarks/targets.py -k test_resnet50[backward-thunder+nvfuser+torch.compile]`, torch.compile executor has problem handling `prims.convert_element_type` with non-tensor input, e.g.: `i4803 = prims.convert_element_type(f4802, int) # i4803: "int 0"` torch.compile executor...

> 2. thunder doesn't really support dynamic shape properly yet. I'm scared about having a shape list with IntegerProxy in it at this point. I think we should go dig...