Carlos Mocholí comments

Results 428 comments of


                                            Carlos Mocholí

CUDA Out of Memory for the Falcon 7B model on A100 80GB GPU

I have a fix in #171 that will reduce the memory requirements for fine-tuning and training

CUDA Out of Memory for the Falcon 7B model on A100 80GB GPU

@k21993 the fix above also applies to lora

CUDA Out of Memory for the Falcon 7B model on A100 80GB GPU

I merged #173, that should fix the FLOPs counter issue. I'll try replicating the sequence length issues you are seeing now

CUDA Out of Memory for the Falcon 7B model on A100 80GB GPU

Hey all. Using current main, here's what I'm calling: `python finetune/adapter.py --checkpoint_dir checkpoints/tiiuae/falcon-7b --precision bf16-true` with `micro_batch_size=1` I get a constant ~16GB use. It might seem to slowly creep up,...

CUDA Out of Memory for the Falcon 7B model on A100 80GB GPU

I merged #178 which should be a small decrease in memory usage. I'll also be adding #182 which includes a change so that the longest alpaca sequence is loaded first,...

The shape in current model is torch.Size([0])

Looks like an issue with the model instantiation. Can you pull `main` and call `scripts/convert_hf_checkpoint.py` again?

Not an issue but a question: How do I pre-train Falcon on completely new language ?

The same technique should work on Falcon, there's nothing substantially different in how the model is pre-trained.

Support LoRA with multiple devices

FSDP also fails with a similar error. This is because we are accessing the lora parameters in `.train()` instead of `.forward()`: https://github.com/Lightning-AI/lit-llama/blob/main/lit_llama/lora.py#L270-L273 @awaelchli suggested removing these calls from the fine-tuning...

Support LoRA with multiple devices

The above issue should be just for LoRA. Is the error exactly the same for Adapter?

Problem in installing dependencies

Hi. Unfortunately, Python 3.7 is not supported as the message indicates. You'll have to upgrade to 3.8, 3.9, or 3.10. See https://pytorch.org/blog/deprecation-cuda-python-support and https://endoflife.date/python