Wojciech Prazuch

Results 5 comments of Wojciech Prazuch

Another way would be just to skip cloning weights when the module is on `meta` device: ```python def swap_linear_layers_for_te(model: nn.Module, swap_layernorm: bool = True) -> None: def parameters_cnt(model: nn.Module) ->...

Hello @t-vi ! Yes, please note my comment: > I did some investigation and I think this is due to the model being placed at `meta` device causing that to...

Here is my complete workaround that we used in our last runs: ```python def swap_linear_layers_for_te(model: nn.Module, swap_layernorm: bool = True, device: str = "meta") -> None: def parameters_cnt(model: nn.Module) ->...

Thanks @t-vi ! The [PR](https://github.com/Lightning-AI/lightning-thunder/pull/1037) is ready for review.

We have new OOM errors for Thunder for: Mistral-7B-v0.2, longchat-13b-16k, vicuna-7b-v1.5-16k for Thunder. The three models (shown in red) fail for the configurations as in the image below. ![Image](https://github.com/user-attachments/assets/edc924bd-6f65-43dc-836e-0347bf79a8ba) Happy...