Yan Wang
Yan Wang
Status update: Thanks to @jjsjann123 's patch https://github.com/Lightning-AI/lightning-thunder/pull/706, the failure in https://github.com/Lightning-AI/lightning-thunder/pull/451#issuecomment-2186631228 is gone, but there is an nvfuser failure about " Unsupported loop structure. Two loops are mapped together.bS323{1}...
Hi all, thanks to @jjsjann123 's fix, we can get the resnet50 working now, please help to review again, thanks
When comparing Thunder Torch Executor to Torch Eager, the ResNet18 gradients are not close for FP32.
run this script ``` import torch import torchvision import os os.environ["NVIDIA_TF32_OVERRIDE"]="0" os.environ["CUBLAS_WORKSPACE_CONFIG"]=":4096:8" torch.manual_seed(42) import random random.seed(42) torch.use_deterministic_algorithms(True) model = torchvision.models.resnet18(weights=None).to(device="cuda", dtype=torch.float32) x = torch.randn((1, 3, 224, 224), dtype=torch.float32, device="cuda", requires_grad=True)...
When comparing Thunder Torch Executor to Torch Eager, the ResNet18 gradients are not close for FP32.
Comparison of fp64 and fp32 results: Torch eager fp64 vs thunder torchex fp32: Mismatched elements: 9248 / 9408 (98.3%) Greatest absolute difference: 0.00013152781110292722 at index (33, 2, 5, 6) (up...
By reducing the n_layers we observe the peak allocated memory(GB): n_layers | torch compile | thunderfx -- | -- | -- 16 | 41.68 | 42.72 2 | 8.18 |...
Hi @mruberry @t-vi , I modified to use hypothesis and the test time is updated in the PR description
Hi @t-vi , although I couldn't reproduce the error (I pulled the litgpt main and the thunder main and ran the test), but I believe this should fix the litgpt...
Hi @tfogal , thank you for filing this, let me think about how to add these options and get back to you
Hi @tfogal Currently for each GraphModule after splitting, we can run with 3 backends: `thunder.jit` with the specified options, `torch.compile` with the specified options and eager, e.g.: ``` import torch...
Hi @tfogal , since we want to run multiple backends, I'm thinking that your use case is probably better served if we modify the interface like this, WDYT? We pass...