litgpt
litgpt copied to clipboard
Cuda OOM : Falcon 7B with block size 3072
Hi,
This might not be possible yet but I am trying to fine-tune Falcon 7B with a block size of 3072 not 2048. I have two 49 GB A6000 and am trying to train with 1 device. using 2048 takes up about 42GB of memory and then I get OOM with 3072.
I am using deepspeed stage 2 and would use stage 3 but it's not working because of the known issue.
Any ideas or should I wait for a stage 3 fix / quantization support?
Thanks, Griffin
I'm replacing DeepSpeed with FSDP in #118. Feel free to try it out and see if it helps before the PR is merged.