Carlos Mocholí comments

Results 427 comments of


                                            Carlos Mocholí

Apply LoRA to more Linear layers

@Andrei-Aksionov Feel free to start this work! We won't have time to work on this for now. You might want to work on the lit-gpt repository instead, which also has...

Apply LoRA to more Linear layers

We are focusing more on that project moving forward. It includes support for gpt-neox-derivative and llama-derivative weights.

explanation of the hyperparameters in the pretraining script

> Is #devices here the number of GPUs per node or the total number (world_size)? Per node > I also checked the finetuning script and found that it's calculated differently:...

Error with verify option when using convert_hf_checkpoint.py

Thanks for reporting! I also noticed in #175. But I think it's safe to ignore. We should remove this flag and code in favor of a test comparing with HF,...

Floating point exception (core dumped)

Can you provide more details about your hardware setup, python environment, and precise llama config? Thanks!

Floating point exception (core dumped)

I'm a bit confused about your llama config, because it doesn't match the values in https://github.com/Lightning-AI/lit-llama/blob/main/lit_llama/model.py#L18-L25 But if you say another machine works, then there might be some environment or...

H100 Transformer Engine implementation

Linked issue in the lightning repo: https://github.com/Lightning-AI/lightning/issues/17172

H100 Transformer Engine implementation

@SinanAkkoyun Do you have access to H100s? If so, would you like to try out the PR https://github.com/Lightning-AI/lightning/pull/17597? It adds support to Fabric by passing `L.Fabric(precision="8-mixed")` You can install it...

H100 Transformer Engine implementation

Yes. Automatic replacement of the layers is missing but it's something that we want to do too. The parallelism customization would be left to the user to do though.

H100 Transformer Engine implementation

@SinanAkkoyun Did you run `generate.py`? How many tokens/sec do you get without it? How does the generation look? Can you `print(fabric.strategy.precision)` to make sure it's using fp8 precision? We might...