Carlos Mocholí comments

Results 427 comments of


                                            Carlos Mocholí

H100 Transformer Engine implementation

It is not automatically set, try this: ```python fabric = L.Fabric(devices=1, precision="8-mixed") dtype = None ```

H100 Transformer Engine implementation

You'll need to do this: https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/installation.html#installation-stable-release

H100 Transformer Engine implementation

Actually, based on https://github.com/NVIDIA/TransformerEngine/discussions/242#discussioncomment-5971270 it seems like we can keep the weights in fp16 or bf16 during inference. Meaning not doing `dtype = None`

H100 Transformer Engine implementation

Your fp16 vs fp32 issues might be caused because of this https://github.com/NVIDIA/TransformerEngine/blob/stable/transformer_engine/pytorch/module.py#L3464-L3467 There might be a conflict with that logic and this logic: https://github.com/Lightning-AI/lit-llama/blob/main/generate.py#L129-L131 https://github.com/Lightning-AI/lit-llama/blob/a24fc5e55e77a020a9a0af305ee6463fd56753c0/lit_llama/utils.py#L128-L133 So maybe the easiest thing...

H100 Transformer Engine implementation

The assertion you are hitting gets raised under two conditions: https://github.com/NVIDIA/TransformerEngine/blob/stable/transformer_engine/pytorch/module.py#L1675 The 7B config will have `c_attn.shape == (4096, 3*4096)` ([ref](https://github.com/Lightning-AI/lit-llama/blob/main/lit_llama/model.py#L30)) and that's divisible by (8, 16) as required by...

H100 Transformer Engine implementation

@28Smiles That seems like a completely separate issue to H100 support. Can you open a different issue?

H100 Transformer Engine implementation

@28Smiles Our inference scripts do not support batch size > 1 at the moment. #188 tracks this

OptimizerArgs

As far as integrating into the scripts, I would: Create an optimizer argument in https://github.com/Lightning-AI/litgpt/blob/36c6a77435d75872f525848ee1570467d120ae80/litgpt/finetune/lora.py#L40 To avoid the duplicate registration, you need to skip it when the function arguments are...

OptimizerArgs

@rasbt I pushed a commit with what I would suggest. The `str` code path could be improved if we want to expose arguments like the learning rate outside of the...

OptimizerArgs

The azure failure does look real: ```python > fit(fabric, devices, state, train_dataloader, val_dataloader, out_dir, tokenizer_dir, train, eval, optimizer) E TypeError: fit() takes 9 positional arguments but 10 were given /__w/6/s/extensions/thunder/pretrain.py:229:...