ALBEF
ALBEF copied to clipboard
How much memory is needed for pre-training
Hello, I am trying to pre-training on 4 RTX3090(24G), but out of memory always occured even I reduced the batch_size to 4. Could you tell me how much memory you used and how much time did you spend for pre-training?
I used 8 A100 with 40G memory each, and the training takes 3-4 days on the 4M dataset. You may want to try fp16 training or gradient checkpointing techniques to reduce memory usage.
I used 8 A100 with 40G memory each, and the training takes 3-4 days on the 4M dataset. You may want to try fp16 training or gradient checkpointing techniques to reduce memory usage.
Thanks a lot, I will try it.