FewShotDetection icon indicating copy to clipboard operation
FewShotDetection copied to clipboard

Some problems about compiling environment

Open NHW2017 opened this issue 4 years ago • 7 comments

Hello! I have some problems when running the program. I suspect that it is caused by a problem with my configuration environment, so I want to ask about your configuration environment, including the version of ubuntu, the version of cuda and the model of the graphics card

NHW2017 avatar Nov 04 '20 05:11 NHW2017

Hi @NHW2017

Yes, the environment configuration is important, especially use the correct PyTorch version and CUDA version.

Two commands might be useful when installing the environments:

conda install pytorch=0.4.1 cuda80 -c pytorch
conda install -c anaconda cudatoolkit=8.0

You probably need to compile everything with CUDA 8.0 and it should be ok for any ubuntu version that supports it. In my case, I've tested in both ubuntu 14.04 or 16.04 using Titan X (12G), I also tested with PyTorch 0.4.0 and 0.4.1, both works fine. However, I've never tried with CUDA 10 and any version of PyTorch >= 1.0

YoungXIAO13 avatar Nov 04 '20 11:11 YoungXIAO13

Hi @NHW2017

Yes, the environment configuration is important, especially use the correct PyTorch version and CUDA version.

Two commands might be useful when installing the environments:

conda install pytorch=0.4.1 cuda80 -c pytorch
conda install -c anaconda cudatoolkit=8.0

You probably need to compile everything with CUDA 8.0 and it should be ok for any ubuntu version that supports it. In my case, I've tested in both ubuntu 14.04 or 16.04 using Titan X (12G), I also tested with PyTorch 0.4.0 and 0.4.1, both works fine. However, I've never tried with CUDA 10 and any version of PyTorch >= 1.0

thank you for your reply! I have another question. I used the same hardware and environment configuration as yours. It takes about 100 minutes to train an EPOCH on the VOC data set. It takes about 35 hours to train 21 epochs at phase=1, which is too long. So I want to ask you if it takes such a long time under the same conditions. If not, it means that there is a problem with my environment.

NHW2017 avatar Nov 06 '20 13:11 NHW2017

Actually, the training time is correct. The base class training on VOC also took me around 30 hours. To accelerate, one solution would be implementing parallel training with multiple GPUs, which I haven't tried yet.

YoungXIAO13 avatar Nov 06 '20 13:11 YoungXIAO13

Actually, the training time is correct. The base class training on VOC also took me around 30 hours. To accelerate, one solution would be implementing parallel training with multiple GPUs, which I haven't tried yet.

Thanks. How about COCO? How long does it take to train Coco once

NHW2017 avatar Nov 06 '20 13:11 NHW2017

The base training on COCO would take ~10 days, which is unfortunately too long with only one GPU of 12G. A work around is to increase the batch size and reduce the training epochs linearly, say 4 times less epochs if you have a GPU of 48G.

YoungXIAO13 avatar Nov 06 '20 13:11 YoungXIAO13

The base training on COCO would take ~10 days, which is unfortunately too long with only one GPU of 12G. A work around is to increase the batch size and reduce the training epochs linearly, say 4 times less epochs if you have a GPU of 48G.

Sorry to interrupt, I would like to ask if you have tried adding Focal loss or other loss function with similar functions to the code? How's the effect? Looking forward to you reply ,thanks!

NHW2017 avatar Jan 27 '21 09:01 NHW2017

Hi @NHW2017

I've not tried other loss functions and have no plan to try in near future. It would be interesting to study the effect.

YoungXIAO13 avatar Jan 27 '21 09:01 YoungXIAO13