Yan Wang comments

Results 78 comments of


                                            Yan Wang

RuntimeError and Socket Connection Failure when Benchmarking Gemma-7b with Micro Batch Size 1

It can be reproduced in container `pjnl-20240830-mixology_70d843cd` but not `pjnl-20240830`

RuntimeError and Socket Connection Failure when Benchmarking Gemma-7b with Micro Batch Size 1

A minimum reproduce is: ``` import torch, thunder def fun(x): x = x * torch.tensor(0.5, dtype=x.dtype) return x x = torch.randn((2,2),dtype=torch.bfloat16).cuda() # print(fun(x)) jfun=thunder.jit(fun) jfun(x) ``` Torch can run `cuda...

Fix auto-registered torch.special operators

Hi @t-vi @IvanYashchuk , we could discuss further if https://github.com/Lightning-AI/lightning-thunder/pull/976 is necessary, this is a bug I found along the way, so I split it out and we could review...

Support for reshape+in-place

I didn't exclude the operators return view in the auto registration, as Ivan mentioned the stride information is not used. And now I find I didn't add the tensor view...

Add functional transformer demo

Hi @t-vi @IvanYashchuk , I rephrased a bit, the main purpose of this notebook is to give an example of writing a simple functional python function for a pytorch module...

Add functional transformer demo

If we run this notebook in CI using the hugging face weights, the HF_TOKEN is needed and the weight is needed to download in `Meta-Llama-3-8B/consolidated.00.pth` under the same folder of...

Add functional transformer demo

>The other question I'd have is if our use of the code is OK here (did we ask the gist author, do we think that the notebook is affected by...

Remove thunder.compile

The only failed case(https://github.com/Lightning-AI/lightning-thunder/pull/837#issuecomment-2245732546) I can reproduce locally is the `FAILED thunder/tests/test_grad.py::test_vjp_correctness_adaptive_avg_pool2d_torch_cuda_thunder.dtypes.float64 - NotImplementedError: VJP for torch.nn.functional.adaptive_avg_pool2d is not implemented`, the reason is because this op only uses torchex.grad_transform. when...

OOM errors for Gemma-7, pythia-12b, Llama-2-13b-hf and Nous-Hermes-13b with FSDP zero3 and 2x8 H100

OOM errors for Gemma-7, pythia-12b, Llama-2-13b-hf and Nous-Hermes-13b with FSDP zero3 and 2x8 H100

By further reducing the n_layers=1 of Llama-2-13b-hf, the memory usage is related to the rematerialization in this case, the only difference part is the memory allocated by ` [t93, t103,...