Yan Wang

Results 78 comments of Yan Wang

Status update: Thanks to @jjsjann123 's patch https://github.com/Lightning-AI/lightning-thunder/pull/706, the failure in https://github.com/Lightning-AI/lightning-thunder/pull/451#issuecomment-2186631228 is gone, but there is an nvfuser failure about " Unsupported loop structure. Two loops are mapped together.bS323{1}...

Hi all, thanks to @jjsjann123 's fix, we can get the resnet50 working now, please help to review again, thanks

run this script ``` import torch import torchvision import os os.environ["NVIDIA_TF32_OVERRIDE"]="0" os.environ["CUBLAS_WORKSPACE_CONFIG"]=":4096:8" torch.manual_seed(42) import random random.seed(42) torch.use_deterministic_algorithms(True) model = torchvision.models.resnet18(weights=None).to(device="cuda", dtype=torch.float32) x = torch.randn((1, 3, 224, 224), dtype=torch.float32, device="cuda", requires_grad=True)...

Comparison of fp64 and fp32 results: Torch eager fp64 vs thunder torchex fp32: Mismatched elements: 9248 / 9408 (98.3%) Greatest absolute difference: 0.00013152781110292722 at index (33, 2, 5, 6) (up...

By reducing the n_layers we observe the peak allocated memory(GB): n_layers | torch compile | thunderfx -- | -- | -- 16 | 41.68 | 42.72 2 | 8.18 |...

Hi @mruberry @t-vi , I modified to use hypothesis and the test time is updated in the PR description

Hi @t-vi , although I couldn't reproduce the error (I pulled the litgpt main and the thunder main and ran the test), but I believe this should fix the litgpt...

Hi @tfogal , thank you for filing this, let me think about how to add these options and get back to you

Hi @tfogal Currently for each GraphModule after splitting, we can run with 3 backends: `thunder.jit` with the specified options, `torch.compile` with the specified options and eager, e.g.: ``` import torch...

Hi @tfogal , since we want to run multiple backends, I'm thinking that your use case is probably better served if we modify the interface like this, WDYT? We pass...