Wojciech Prazuch
Wojciech Prazuch
Another way would be just to skip cloning weights when the module is on `meta` device: ```python def swap_linear_layers_for_te(model: nn.Module, swap_layernorm: bool = True) -> None: def parameters_cnt(model: nn.Module) ->...
Hello @t-vi ! Yes, please note my comment: > I did some investigation and I think this is due to the model being placed at `meta` device causing that to...
Here is my complete workaround that we used in our last runs: ```python def swap_linear_layers_for_te(model: nn.Module, swap_layernorm: bool = True, device: str = "meta") -> None: def parameters_cnt(model: nn.Module) ->...
Thanks @t-vi ! The [PR](https://github.com/Lightning-AI/lightning-thunder/pull/1037) is ready for review.
We have new OOM errors for Thunder for: Mistral-7B-v0.2, longchat-13b-16k, vicuna-7b-v1.5-16k for Thunder. The three models (shown in red) fail for the configurations as in the image below.  Happy...