BatchFormer icon indicating copy to clipboard operation
BatchFormer copied to clipboard

How to improve detr

Open zhangzhen119 opened this issue 1 year ago • 14 comments

Could you please share about your improved code for batchformer on detr, I want to learn about the improvement for detr

zhangzhen119 avatar Oct 13 '22 13:10 zhangzhen119

Hi @zhangzhen119, Thanks for your comments and sorry for getting to you late. The improvement in DETR is similar to Deformable-DETR. The differences between the two codes are that Deformable-DETR uses batch first transformer, while the batch is in the second dimension. If you want a clean code, you can find the changes as follows,

diff models/deformable_transformer.py <(curl https://raw.githubusercontent.com/fundamentalvision/Deformable-DETR/main/models/deformable_transformer.py)

For the code on detr, I have shared the repository to you. It might include some codes for my previous project (HOI compositional learning), and I do not clean code. The code for batchformer-v2 is provided in https://github.com/zhihou7/detr/blob/master/models/transformer.py. I will release a clean detr code when I have time.

If you have further questions, feel free to ask. Regards,

zhihou7 avatar Oct 14 '22 07:10 zhihou7

Hi @zhangzhen119, Thanks for your comments and sorry for getting to you late. The improvement in DETR is similar to Deformable-DETR. The differences between the two codes are that Deformable-DETR uses batch first transformer, while the batch is in the second dimension. If you want a clean code, you can find the changes as follows,

diff models/deformable_transformer.py <(curl https://raw.githubusercontent.com/fundamentalvision/Deformable-DETR/main/models/deformable_transformer.py)

For the code on detr, I have shared the repository to you. It might include some codes for my previous project (HOI compositional learning), and I do not clean code. The code for batchformer-v2 is provided in https://github.com/zhihou7/detr/blob/master/models/transformer.py. I will release a clean detr code when I have time.

If you have further questions, feel free to ask. Regards,

Thank you very much for your help and congratulations on the work you have done on this

zhangzhen119 avatar Oct 14 '22 07:10 zhangzhen119

My pleasure

zhihou7 avatar Oct 15 '22 10:10 zhihou7

Hello, in the process of reproducing your improvement of detr using batchformer, I found that you have added some parameters to the main function. Is this the parameter for the best improvement in your paper? If I want to get How to set these parameters for the same improvement, sorry to bother you image

zhangzhen119 avatar Oct 19 '22 01:10 zhangzhen119

I set bf as 3, that is the batchformerv2. It is Because I use bf == 1 to indicate the experiment without shared prediction modules. base_bf is used for the baseline, thus you can ignore it. start_idx is useless when you set insert_idx. I use insert_idx to indicate the insert layer in the transformer encoder. I set insert_idx 0 (the first layer). use_checkpoint is to use checkpoint to reduce memory because I usually have 4 16G V100. Thus, in the experiment on DETR, I set batch size 2. share_bf indicates we share the batchformer along different layers. Interestingly, this does not degrade the performance too much.
Other parameters do not affect the performance. I do not use it. It exists just because I think the weight decay might affect the performance according to my experience in BatchFormerV1.

zhihou7 avatar Oct 19 '22 02:10 zhihou7

我将bf设置为3,即batchformerv2。这是因为我bf == 1 用来表示没有共享预测模块的实验。 base_bf 用于基线,因此您可以忽略它。 设置 insert_idx 时 start_idx 没用。我使用 insert_idx 来表示转换器编码器中的插入层。我设置了 insert_idx 0(第一层)。 use_checkpoint 是使用checkpoint来减少内存,因为我平时有4个16G V100。因此,在 DETR 的实验中,我将批量大小设置为 2。share_bf 表示我们沿不同层共享批处理形成器。有趣的是,这并没有过多地降低性能。 其他参数不影响性能。我不用这个。它的存在只是因为根据我在 BatchFormerV1 中的经验,我认为重量衰减可能会影响性能。

I am still very poor at code learning, so I asked some relatively simple questions, I really appreciate your prompt and effective reply, I will do experiments and learning based on your suggestions, and wish you a higher achievement

zhangzhen119 avatar Oct 19 '22 02:10 zhangzhen119

Thanks. it is mainly because my code is too messy.

zhihou7 avatar Oct 19 '22 02:10 zhihou7

Excuse me, I use your batchformerv2 in a transformer structure similar to detr, the parameters are set according to your suggestion, but in the end there is only about 0.1 improvement, is this improvement reasonable. Due to equipment problems, I set batchsize to 4, and did not use the optimal solution batchsize=24 mentioned in your article. Is the main reason for the small improvement is the problem of batchsize? If I want to only use batchsize=4 Are there any other possible solutions? sorry to trouble you

zhangzhen119 avatar Oct 25 '22 13:10 zhangzhen119

Hi, how many epochs do you train the network? Could you provide the logs? Meanwhile, do you run the experiments on a single GPU with batchsize 4 or 4GPUs with batchsize 4?

Here is the baseline log and Here is the batchformer log

The two logs are trained with batchsize 16 and 8 GPUs. I do not implement the multi-gpu distributation training. Therefore, it depends on the batch size on a single gpu.

zhihou7 avatar Oct 26 '22 06:10 zhihou7

你好,你训练了多少个 epoch?你能提供日志吗?同时,您是在批处理大小为 4 的单个 GPU 上还是在批处理大小为 4 的 4GPU 上运行实验?

这是基线日志,这是批处理器日志

这两个日志使用批量大小为 16 和 8 个 GPU 进行训练。我没有实施多 GPU 分布训练。因此,它取决于单个 gpu 上的批量大小。

Sorry my log files are not saved, I ran on a single GPU with batchsize=4, trained for 80 epochs, when I reached epoch=17 it started to drop, then I stopped, my device is 3060ti because I Other related parameters have also been adjusted, but basically there is no change, so I suspect that the effect is not achieved because the batch size is relatively small.

zhangzhen119 avatar Oct 26 '22 06:10 zhangzhen119

Do u mean the performance drops after 17epochs? Do u use shared prediction modules? I mean siamese stream.

Get Outlook for iOShttps://aka.ms/o0ukef


From: zhangzhen119 @.> Sent: Wednesday, October 26, 2022 5:54:36 PM To: zhihou7/BatchFormer @.> Cc: Zhi Hou @.>; Comment @.> Subject: Re: [zhihou7/BatchFormer] How to improve detr (Issue #12)

你好,你训练了多少个 epoch?你能提供日志吗?同时,您是在批处理大小为 4 的单个 GPU 上还是在批处理大小为 4 的 4GPU 上运行实验?

这是基线日志https://drive.google.com/file/d/1PrLn5SOeSbpW-UvjJgVCTLmOSkerIRRd/view?usp=sharing,这是批处理器日志https://drive.google.com/file/d/1t60MSkNCv5eLOo-2TR0fYfbYTPQpcU_E/view?usp=sharing

这两个日志使用批量大小为 16 和 8 个 GPU 进行训练。我没有实施多 GPU 分布训练。因此,它取决于单个 gpu 上的批量大小。

Sorry my log files are not saved, I ran on a single GPU with batchsize=4, trained for 80 epochs, when I reached epoch=17 it started to drop, then I stopped, my device is 3060ti because I Other related parameters have also been adjusted, but basically there is no change, so I suspect that the effect is not achieved because the batch size is relatively small.

― Reply to this email directly, view it on GitHubhttps://github.com/zhihou7/BatchFormer/issues/12#issuecomment-1291581987, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AQGPLYEQUZAJZIOQ5A2GSUTWFDISZANCNFSM6AAAAAAREJOWBI. You are receiving this because you commented.Message ID: @.***>

zhihou7 avatar Oct 26 '22 09:10 zhihou7

你的意思是17个epochs后性能下降吗?你使用共享预测模块吗?我的意思是暹罗流。获取适用于 iOS 的 Outlook< https://aka.ms/o0ukef > ________________________________ From: zhangzhen119 @.> Sent: Wednesday, October 26, 2022 5:54:36 PM To: zhihou7/BatchFormer @.> Cc: Zhi Hou @.>; Comment @.> Subject: Re: [zhihou7/BatchFormer] How to improve detr (Issue #12) 你好,你训练了多少个 epoch?你能提供日志吗?同时,您是在批处理大小为 4 的单个 GPU 上还是在批处理大小为 4 的 4GPU 上运行实验? 这是基线日志<[https://drive.google.com/file/d/1PrLn5SOeSbpW-UvjJgVCTLmOSkerIRRd/view?usp=sharing>,这是批处理器日志https://drive.google.com/file/d/1t60MSkNCv5eLOo-2TR0fYfbYTPQpcU_E/view?usp=sharing](https://drive.google.com/file/d/1PrLn5SOeSbpW-UvjJgVCTLmOSkerIRRd/view?usp=sharing%EF%BC%8C%E8%BF%99%E6%98%AF%E6%89%B9%E5%A4%84%E7%90%86%E5%99%A8%E6%97%A5%E5%BF%97https://drive.google.com/file/d/1t60MSkNCv5eLOo-2TR0fYfbYTPQpcU_E/view?usp=sharing) 这两个日志使用批量大小为 16 和 8 个 GPU 进行训练。我没有实施多 GPU 分布训练。因此,它取决于单个 gpu 上的批量大小。 Sorry my log files are not saved, I ran on a single GPU with batchsize=4, trained for 80 epochs, when I reached epoch=17 it started to drop, then I stopped, my device is 3060ti because I Other related parameters have also been adjusted, but basically there is no change, so I suspect that the effect is not achieved because the batch size is relatively small. ― Reply to this email directly, view it on GitHub<#12 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AQGPLYEQUZAJZIOQ5A2GSUTWFDISZANCNFSM6AAAAAAREJOWBI. You are receiving this because you commented.Message ID: @.***>

Yes, my model started to drop at the 17th epoch without batchformer, so I think this is normal. I used what you shared with me in detr and then improved it. Sorry, I didn't see the use of shared modules, but when I looked at the code you shared with me, I found that I didn't use batchformer only in the training phase. This problem is caused, so I am going to use it only in the training phase and try again

zhangzhen119 avatar Oct 26 '22 10:10 zhangzhen119

If you do not share other modules in the network, you will suffer from performance dropping when you do not use the batchformer in the test phrase.

I copy the batch into batchformerv2 stream, then input the original feature batch and the feature batch with batchformerv2 into the next modules.

zhihou7 avatar Oct 26 '22 10:10 zhihou7

如果您不共享网络中的其他模块,那么当您在测试短语中不使用 batchformer 时,您将遭受性能下降的困扰。

我将批次复制到 batchformerv2 流中,然后将原始特征批次和带有 batchformerv2 的特征批次输入到下一个模块中。

Ok thank you, I'll try again, sorry for your inconvenience

zhangzhen119 avatar Oct 26 '22 10:10 zhangzhen119