Cuda OOM : Falcon 7B with block size 3072

Open griff4692 opened this issue 2 years ago • 1 comments

Hi,

This might not be possible yet but I am trying to fine-tune Falcon 7B with a block size of 3072 not 2048. I have two 49 GB A6000 and am trying to train with 1 device. using 2048 takes up about 42GB of memory and then I get OOM with 3072.

I am using deepspeed stage 2 and would use stage 3 but it's not working because of the known issue.

Any ideas or should I wait for a stage 3 fix / quantization support?

Thanks, Griffin

Jun 20 '23 17:06 griff4692

I'm replacing DeepSpeed with FSDP in #118. Feel free to try it out and see if it helps before the PR is merged.

Jun 21 '23 16:06 carmocca