Sebastian Raschka

Results 821 comments of Sebastian Raschka

Implemented all the suggestions @carmocca . Should be good to review.

Arg, it all works fine with StableLM. But I just noticed that this causes issues with Falcon. ``` size mismatch for transformer.h.20.attn.attn.weight: copying a param with shape torch.Size([4672, 4544]) from...

I just noticed this also needs the `ds_config` for deepspeed. Will add this to the PR shortly

Should we also change this to FSDP before merging @carmocca or figure it out later?

Besides FSDP and Falcon, everything should be addressed now. Thanks for the thorough review!

@k21993 LoRA with Falcon 7B should work on a single GPU with ~16 Gb. If not, you can change the `micro_batch_size = 4` to `micro_batch_size = 1` (it only affects...

That's weird, here are the complete settings I used https://github.com/rasbt/LLM-finetuning-scripts/blob/main/lit-benchmarks/falcon-7b/finetune/lora.py via ``` python finetune/lora.py --checkpoint_dir checkpoints/tiiuae/falcon-7b/ ``` the peak memory use was 16.97 according to ```python print(f"Memory used: {torch.cuda.max_memory_reserved() /...

Regarding the 1 GPU setting you have above, you mention `micro_batch_size = 4`. So if you set this to `micro_batch_size = 1`, then theoretically it should work: 67,775 Mib /...

Regarding multi-GPU training, it is currently set to deep speed stage 2, which is not very memory efficient (it optimizes for speed). If you set this to deepspeed stage 3,...

That makes sense, thanks for clarifying! I could give it a try when I am back from CVPR next week (currently working on code for the talk) but if there...