Carlos Mocholí

Results 428 comments of Carlos Mocholí

> I am running in 16 bit. `--precision 16-mixed` or `--precision 16-true`?

I looked into implementing this ([branch](https://github.com/Lightning-AI/lit-gpt/compare/carmocca/mpt?expand=1)). The missing pieces are: - ALiBi - Low precision LayerNorm And to reproduce training, they also do - Tied embeddings weights with lm_head -...

You are using FSDP for inference, right? It won't fit in a single 80GB card. How many devices are you using?

The model won't fit into any single 80GB card unless you do quantization. So either you do that or the model needs to be sharded by using FSDP. I'm don't...

LoRA distributed support is tracked in #161

Regarding training falcon 40b on 8 A100 80GB GPUs, I don't have access to that hardware, but you can try the suggestions in https://github.com/Lightning-AI/lit-gpt/blob/main/tutorials/oom.md. You'll need to use [sharding](https://github.com/Lightning-AI/lit-gpt/blob/main/tutorials/oom.md#do-sharding-across-multiple-gpus) as...

[Here](https://github.com/Lightning-AI/lit-gpt/blob/main/finetune/adapter.py#L69) (for example)

We have a guide for dealing with OOMs here: https://github.com/Lightning-AI/lit-gpt/blob/main/tutorials/oom.md

The pretraining scripts now have a `resume` argument. The finetuning scripts don't have it yet. Contributions would be welcome