Masaki Kozuki
Masaki Kozuki
The cause seems to be basically `parallel_residual=True` as in https://github.com/Lightning-AI/lightning-thunder/issues/246#issuecomment-2302121789
Script to run `litgpt.model.Block` with the config of "stablecode-completion-alpha-3b" whose `parallel_residual` by default is `True`. ```python import argparse import gc import torch from litgpt import Config, GPT from litgpt.model import...
`parallel_residual=True` uses intermediates tensors of ``` (cos, sin, t118, t119, t121, t130, t135, t20, t39, t4, t8, t84, t87, t88, t89, t90, t91, t93, t_attn_attn_weight, t_attn_proj_weight, t_mlp_fc_weight, t_mlp_proj_weight, t_norm_1_weight, t_norm_2_weight,...
https://gist.github.com/crcrpar/ce52789c933ca7013049c6eb1ba06366 has aot fwd and bwd. Backward arguments are as follows: ```python def forward( self, primals_1: "bf16[2560][1]cuda:0", primals_3: "bf16[1, 16384, 2560][41943040, 2560, 1]cuda:0", primals_6: "bf16[16384, 20][20, 1]cuda:0", primals_7: "bf16[16384, 20][20,...
> File "/home/glm/apex/setup.py", line 4, in > from packaging.version import parse, Version > ModuleNotFoundError: No module named 'packaging' could you install `packaging` and retry?
> @crcrpar, do you remember why we skip `master_weights` for `bfloat16`? I'm unsure but I vaguely remember there wasn't master weights usage in fused adam so it feels like more...
Overall sounds great to me. I have some questions and comments to help myself understand this proposal better. Q1 -- `setup_operators`: Would it even let us register custom executor, like...
cc @jpool-nv @ChongyuNVIDIA could you review this?
Hi @h-vetinari excuse me for having been idling, I just created a new tag which points to the commit shipped in `nvcr.io/nvidia/pytorch:23.05-py3`. I'll try to create the other missing tags...
Did you set `CUDA_HOME` environment variable? If not, could you try the environment variable?