BRIO
BRIO copied to clipboard
GPU usage increasing as training progresses
Hi,
Thank you for the good work.
- What is the GPU size that is required to train this model?
- I am currently using eight 32 GB GPUs to train the model. The memory usage increases as training progresses and ultimately crosses 32 GB causing GPU overflow. Is there a workaround for this? I see that the code has
del
commands to remove tensors that are no longer needed. Is there anything else that also needs to be deleted?
Thank you!
Hi,
What is the GPU size that is required to train this model?
We used RTX 3090 with 24G GPU memory.
I am currently using eight 32 GB GPUs to train the model. The memory usage increases as training progresses and ultimately crosses 32 GB causing GPU overflow. Is there a workaround for this? I see that the code has del commands to remove tensors that are no longer needed. Is there anything else that also needs to be deleted?
32GB GPUs should be enough for the model training. But if you are still facing this problem, maybe you could consider using Adafactor instead of Adam as the optimizer. Also depending on where this overflow occurs, it may be helpful to reduce the batch size of the dataloader for the evaluation set. https://github.com/yixinL7/BRIO/blob/a32b78e87fb8282847ee8c1e17856c8dad8d906c/main.py#L370
Please let me know if you have more questions.
我用两块24G 的3090Ti 是否可以进行模型的训练;
数据来源于您的步骤--->Generate Candidate Summaries
--> Preprocess Your Own Data
--> Train
;
我在尝试的时候发现4块也不能训练;其中候选摘要can_num是16 @yixinL7
Hi,
What is the GPU size that is required to train this model?
We used RTX 3090 with 24G GPU memory.
I am currently using eight 32 GB GPUs to train the model. The memory usage increases as training progresses and ultimately crosses 32 GB causing GPU overflow. Is there a workaround for this? I see that the code has del commands to remove tensors that are no longer needed. Is there anything else that also needs to be deleted?
32GB GPUs should be enough for the model training. But if you are still facing this problem, maybe you could consider using Adafactor instead of Adam as the optimizer. Also depending on where this overflow occurs, it may be helpful to reduce the batch size of the dataloader for the evaluation set.
https://github.com/yixinL7/BRIO/blob/a32b78e87fb8282847ee8c1e17856c8dad8d906c/main.py#L370
Please let me know if you have more questions.
Could you show me the way we can train brio-cnndm-uncased model on 1 GPU 11Gb?
我用两块24G 的3090Ti 是否可以进行模型的训练
The question is if two 3090Ti are enough for model training. Yes, you should be able to train the model using 2 GPUs. But you will need to increase the step of gradient accumulation to keep the same effective batch size.
Could you show me the way we can train brio-cnndm-uncased model on 1 GPU 11Gb?
I'm not sure if there's a workaround for training the model on 11GB. Unfortunately 11GB GPU is barely enough for training the baseline model (BART). There are two things you could try:
- Reducing the number of candidates used in training to a smaller number like 4. It should still be enough to improve the model's performance. https://github.com/yixinL7/BRIO/blob/main/config.py#L25
- Using gradient checkpointing. I wouldn't really recommend this because it will slow down the training a lot.
我用两块24G 的3090Ti 是否可以进行模型的训练
The question is if two 3090Ti are enough for model training. Yes, you should be able to train the model using 2 GPUs. But you will need to increase the step of gradient accumulation to keep the same effective batch size.
Could you show me the way we can train brio-cnndm-uncased model on 1 GPU 11Gb?
I'm not sure if there's a workaround for training the model on 11GB. Unfortunately 11GB GPU is barely enough for training the baseline model (BART). There are two things you could try:
- Reducing the number of candidates used in training to a smaller number like 4. It should still be enough to improve the model's performance. https://github.com/yixinL7/BRIO/blob/main/config.py#L25
- Using gradient checkpointing. I wouldn't really recommend this because it will slow down the training a lot.
Thank you for your reply, I tried to reduce args.max_num from 16 to 2 and it worked well. I also changed the args.total_len from 1024 to 512 to fit with 11Gb GPU.
Thank you again :)
我用两块24G 的3090Ti 是否可以进行模型的训练
The question is if two 3090Ti are enough for model training. Yes, you should be able to train the model using 2 GPUs. But you will need to increase the step of gradient accumulation to keep the same effective batch size.
Could you show me the way we can train brio-cnndm-uncased model on 1 GPU 11Gb?
I'm not sure if there's a workaround for training the model on 11GB. Unfortunately 11GB GPU is barely enough for training the baseline model (BART). There are two things you could try:
- Reducing the number of candidates used in training to a smaller number like 4. It should still be enough to improve the model's performance. https://github.com/yixinL7/BRIO/blob/main/config.py#L25
- Using gradient checkpointing. I wouldn't really recommend this because it will slow down the training a lot.
Thank you for the great work. Could you please kindly explain why we should increase the step of gradient accumulation while training on multiple GPUs?
我用两块24G 的3090Ti 是否可以进行模型的训练
The question is if two 3090Ti are enough for model training. Yes, you should be able to train the model using 2 GPUs. But you will need to increase the step of gradient accumulation to keep the same effective batch size.
Could you show me the way we can train brio-cnndm-uncased model on 1 GPU 11Gb?
I'm not sure if there's a workaround for training the model on 11GB. Unfortunately 11GB GPU is barely enough for training the baseline model (BART). There are two things you could try:
- Reducing the number of candidates used in training to a smaller number like 4. It should still be enough to improve the model's performance. https://github.com/yixinL7/BRIO/blob/main/config.py#L25
- Using gradient checkpointing. I wouldn't really recommend this because it will slow down the training a lot.
Thank you for your reply, I tried to reduce args.max_num from 16 to 2 and it worked well. I also changed the args.total_len from 1024 to 512 to fit with 11Gb GPU.
Thank you again :)
Hi,I'd like to ask how long it takes you to train an epoch with 11GB GPU.Thanks