ALBEF icon indicating copy to clipboard operation
ALBEF copied to clipboard

How much memory is needed for pre-training

Open 2292384454 opened this issue 2 years ago • 2 comments

Hello, I am trying to pre-training on 4 RTX3090(24G), but out of memory always occured even I reduced the batch_size to 4. Could you tell me how much memory you used and how much time did you spend for pre-training?

2292384454 avatar Mar 05 '22 15:03 2292384454

I used 8 A100 with 40G memory each, and the training takes 3-4 days on the 4M dataset. You may want to try fp16 training or gradient checkpointing techniques to reduce memory usage.

LiJunnan1992 avatar Mar 06 '22 00:03 LiJunnan1992

I used 8 A100 with 40G memory each, and the training takes 3-4 days on the 4M dataset. You may want to try fp16 training or gradient checkpointing techniques to reduce memory usage.

Thanks a lot, I will try it.

2292384454 avatar Mar 06 '22 06:03 2292384454