BRIO icon indicating copy to clipboard operation
BRIO copied to clipboard

GPU usage increasing as training progresses

Open thechargedneutron opened this issue 2 years ago • 7 comments

Hi,

Thank you for the good work.

  1. What is the GPU size that is required to train this model?
  2. I am currently using eight 32 GB GPUs to train the model. The memory usage increases as training progresses and ultimately crosses 32 GB causing GPU overflow. Is there a workaround for this? I see that the code has del commands to remove tensors that are no longer needed. Is there anything else that also needs to be deleted?

Thank you!

thechargedneutron avatar Jun 13 '22 03:06 thechargedneutron

Hi,

What is the GPU size that is required to train this model?

We used RTX 3090 with 24G GPU memory.

I am currently using eight 32 GB GPUs to train the model. The memory usage increases as training progresses and ultimately crosses 32 GB causing GPU overflow. Is there a workaround for this? I see that the code has del commands to remove tensors that are no longer needed. Is there anything else that also needs to be deleted?

32GB GPUs should be enough for the model training. But if you are still facing this problem, maybe you could consider using Adafactor instead of Adam as the optimizer. Also depending on where this overflow occurs, it may be helpful to reduce the batch size of the dataloader for the evaluation set. https://github.com/yixinL7/BRIO/blob/a32b78e87fb8282847ee8c1e17856c8dad8d906c/main.py#L370

Please let me know if you have more questions.

yixinL7 avatar Jun 15 '22 03:06 yixinL7

我用两块24G 的3090Ti 是否可以进行模型的训练; 数据来源于您的步骤--->Generate Candidate Summaries --> Preprocess Your Own Data --> Train; 我在尝试的时候发现4块也不能训练;其中候选摘要can_num是16 @yixinL7

RoyZhanyi avatar Jul 18 '22 10:07 RoyZhanyi

Hi,

What is the GPU size that is required to train this model?

We used RTX 3090 with 24G GPU memory.

I am currently using eight 32 GB GPUs to train the model. The memory usage increases as training progresses and ultimately crosses 32 GB causing GPU overflow. Is there a workaround for this? I see that the code has del commands to remove tensors that are no longer needed. Is there anything else that also needs to be deleted?

32GB GPUs should be enough for the model training. But if you are still facing this problem, maybe you could consider using Adafactor instead of Adam as the optimizer. Also depending on where this overflow occurs, it may be helpful to reduce the batch size of the dataloader for the evaluation set.

https://github.com/yixinL7/BRIO/blob/a32b78e87fb8282847ee8c1e17856c8dad8d906c/main.py#L370

Please let me know if you have more questions.

Could you show me the way we can train brio-cnndm-uncased model on 1 GPU 11Gb?

tiennvcs avatar Jul 20 '22 07:07 tiennvcs

我用两块24G 的3090Ti 是否可以进行模型的训练

The question is if two 3090Ti are enough for model training. Yes, you should be able to train the model using 2 GPUs. But you will need to increase the step of gradient accumulation to keep the same effective batch size.

Could you show me the way we can train brio-cnndm-uncased model on 1 GPU 11Gb?

I'm not sure if there's a workaround for training the model on 11GB. Unfortunately 11GB GPU is barely enough for training the baseline model (BART). There are two things you could try:

  1. Reducing the number of candidates used in training to a smaller number like 4. It should still be enough to improve the model's performance. https://github.com/yixinL7/BRIO/blob/main/config.py#L25
  2. Using gradient checkpointing. I wouldn't really recommend this because it will slow down the training a lot.

yixinL7 avatar Jul 20 '22 16:07 yixinL7

我用两块24G 的3090Ti 是否可以进行模型的训练

The question is if two 3090Ti are enough for model training. Yes, you should be able to train the model using 2 GPUs. But you will need to increase the step of gradient accumulation to keep the same effective batch size.

Could you show me the way we can train brio-cnndm-uncased model on 1 GPU 11Gb?

I'm not sure if there's a workaround for training the model on 11GB. Unfortunately 11GB GPU is barely enough for training the baseline model (BART). There are two things you could try:

  1. Reducing the number of candidates used in training to a smaller number like 4. It should still be enough to improve the model's performance. https://github.com/yixinL7/BRIO/blob/main/config.py#L25
  2. Using gradient checkpointing. I wouldn't really recommend this because it will slow down the training a lot.

Thank you for your reply, I tried to reduce args.max_num from 16 to 2 and it worked well. I also changed the args.total_len from 1024 to 512 to fit with 11Gb GPU.

Thank you again :)

tiennvcs avatar Jul 21 '22 09:07 tiennvcs

我用两块24G 的3090Ti 是否可以进行模型的训练

The question is if two 3090Ti are enough for model training. Yes, you should be able to train the model using 2 GPUs. But you will need to increase the step of gradient accumulation to keep the same effective batch size.

Could you show me the way we can train brio-cnndm-uncased model on 1 GPU 11Gb?

I'm not sure if there's a workaround for training the model on 11GB. Unfortunately 11GB GPU is barely enough for training the baseline model (BART). There are two things you could try:

  1. Reducing the number of candidates used in training to a smaller number like 4. It should still be enough to improve the model's performance. https://github.com/yixinL7/BRIO/blob/main/config.py#L25
  2. Using gradient checkpointing. I wouldn't really recommend this because it will slow down the training a lot.

Thank you for the great work. Could you please kindly explain why we should increase the step of gradient accumulation while training on multiple GPUs?

ruili33 avatar Nov 01 '22 01:11 ruili33

我用两块24G 的3090Ti 是否可以进行模型的训练

The question is if two 3090Ti are enough for model training. Yes, you should be able to train the model using 2 GPUs. But you will need to increase the step of gradient accumulation to keep the same effective batch size.

Could you show me the way we can train brio-cnndm-uncased model on 1 GPU 11Gb?

I'm not sure if there's a workaround for training the model on 11GB. Unfortunately 11GB GPU is barely enough for training the baseline model (BART). There are two things you could try:

  1. Reducing the number of candidates used in training to a smaller number like 4. It should still be enough to improve the model's performance. https://github.com/yixinL7/BRIO/blob/main/config.py#L25
  2. Using gradient checkpointing. I wouldn't really recommend this because it will slow down the training a lot.

Thank you for your reply, I tried to reduce args.max_num from 16 to 2 and it worked well. I also changed the args.total_len from 1024 to 512 to fit with 11Gb GPU.

Thank you again :)

Hi,I'd like to ask how long it takes you to train an epoch with 11GB GPU.Thanks

hoboyu11 avatar Jan 15 '24 14:01 hoboyu11