PersonReID-NAFS icon indicating copy to clipboard operation
PersonReID-NAFS copied to clipboard

question about performance

Open baixiao930 opened this issue 4 years ago • 9 comments

I trained the model without any changes, with the default training strategy (set in run.sh), but obtained 54.2% R@1. Would you please tell me how to achieve the reported 61.5% R@1 performance as in paper? Should I change the training strategy or some hyper-parameters? Great thanks if get your help.

baixiao930 avatar Feb 23 '21 13:02 baixiao930

I trained the model without any changes, with the default training strategy (set in run.sh), but obtained 54.2% R@1. Would you please tell me how to achieve the reported 61.5% R@1 performance as in paper? Should I change the training strategy or some hyper-parameters? Great thanks if get your help.

Thanks for your attention on our work. Two days ago, I ran the code again using four Tesla P40. I upload the model and training log. Maybe you should using 4 GPUs to run the code. The previous experiments were done with 4 GPUs.

wettera avatar Mar 01 '21 06:03 wettera

Thanks for your reply. I ran the code again with 4 gpus, and only a littile improvements are obtained. I compare my training log with the one you uploaded, with the exactly same setting, the model trained by me can only reach 48% R@1 in the first 20 epochs, much lower than in the train.log. I think this results in the final performance inferiority. Are there any other adjustments I can do to improve the performance? Thanks a lot if you can help. train.log.txt

baixiao930 avatar Mar 02 '21 05:03 baixiao930

I trained the model without any changes, with the default training strategy (set in run.sh), but obtained 54.2% R@1. Would you please tell me how to achieve the reported 61.5% R@1 performance as in paper? Should I change the training strategy or some hyper-parameters? Great thanks if get your help.

Thanks for your attention on our work. Two days ago, I ran the code again using four Tesla P40. I upload the model and training log. Maybe you should using 4 GPUs to run the code. The previous experiments were done with 4 GPUs.

Hi, Thanks for sharing your brilliant work! I try to reproduce your results using 4 GPUs, and I find that when using multi-GPU training for this code, it sames that we need to add more code in the Loss model. Because if we decrease the batch size for each GPU(batch size=16), the distribution of p and q will be effected. So as you mentioned you have tried 4 GPUs training, I wonder do you add more code to fix this bug? I do appreciate if you can share your ideas.

NovaMind-Z avatar Mar 02 '21 13:03 NovaMind-Z

Thanks for your reply. I ran the code again with 4 gpus, and only a littile improvements are obtained. I compare my training log with the one you uploaded, with the exactly same setting, the model trained by me can only reach 48% R@1 in the first 20 epochs, much lower than in the train.log. I think this results in the final performance inferiority. Are there any other adjustments I can do to improve the performance? Thanks a lot if you can help. train.log.txt

Please check the environment. You can first test the our model to see whether it can achieve the same performance as our train log reported. If it does not work, feel free to contact us.

wettera avatar Mar 03 '21 02:03 wettera

Thanks for your reply. I ran the code again with 4 gpus, and only a littile improvements are obtained. I compare my training log with the one you uploaded, with the exactly same setting, the model trained by me can only reach 48% R@1 in the first 20 epochs, much lower than in the train.log. I think this results in the final performance inferiority. Are there any other adjustments I can do to improve the performance? Thanks a lot if you can help. train.log.txt

When I try to run code with 4 GPUs, my code always goes wrong. Where have you changed when you run 4gpus? Can you share it with me? Thank you very much.

yyll1998 avatar Mar 12 '21 03:03 yyll1998

Thanks for your reply. I ran the code again with 4 gpus, and only a littile improvements are obtained. I compare my training log with the one you uploaded, with the exactly same setting, the model trained by me can only reach 48% R@1 in the first 20 epochs, much lower than in the train.log. I think this results in the final performance inferiority. Are there any other adjustments I can do to improve the performance? Thanks a lot if you can help. train.log.txt

Do you achieve the 62% performance now? I use 4 RTX-3090 to train this model and meet the same problem with you.

NovaMind-Z avatar Mar 22 '21 12:03 NovaMind-Z

Thanks for your reply. I ran the code again with 4 gpus, and only a littile improvements are obtained. I compare my training log with the one you uploaded, with the exactly same setting, the model trained by me can only reach 48% R@1 in the first 20 epochs, much lower than in the train.log. I think this results in the final performance inferiority. Are there any other adjustments I can do to improve the performance? Thanks a lot if you can help. train.log.txt

When I try to run code with 4 GPUs, my code always goes wrong. Where have you changed when you run 4gpus? Can you share it with me? Thank you very much.

Is your Pytorch version >= 1.5.0? I meet the StopIteration problem when using Pytorch1.7.

NovaMind-Z avatar Mar 22 '21 12:03 NovaMind-Z

Thanks for your reply. I ran the code again with 4 gpus, and only a littile improvements are obtained. I compare my training log with the one you uploaded, with the exactly same setting, the model trained by me can only reach 48% R@1 in the first 20 epochs, much lower than in the train.log. I think this results in the final performance inferiority. Are there any other adjustments I can do to improve the performance? Thanks a lot if you can help. train.log.txt

When I try to run code with 4 GPUs, my code always goes wrong. Where have you changed when you run 4gpus? Can you share it with me? Thank you very much.

Is your Pytorch version >= 1.5.0? I meet the StopIteration problem when using Pytorch1.7.

Is the environment the same as the following:

  • Python 3.7
  • Pytorch 1.0.0 & torchvision 0.2.1
  • numpy
  • matplotlib (not necessary unless the need for the result figure)
  • scipy 1.2.1
  • pytorch_transformers 1.2.0

wettera avatar Mar 23 '21 02:03 wettera

Thanks for your reply. I ran the code again with 4 gpus, and only a littile improvements are obtained. I compare my training log with the one you uploaded, with the exactly same setting, the model trained by me can only reach 48% R@1 in the first 20 epochs, much lower than in the train.log. I think this results in the final performance inferiority. Are there any other adjustments I can do to improve the performance? Thanks a lot if you can help. train.log.txt

Do you achieve the 62% performance now? I use 4 RTX-3090 to train this model and meet the same problem with you.

Hi, I'm currently experiencing the same problem as you, with only 55% performance reproduced, have you solved the problem? Can you achieve the 62% performance now?

Video-AD avatar Apr 24 '22 02:04 Video-AD