HiSup Training the model with another setup

I have tried training the model with:

RTX 2080 Super GPU with 8GB VRAM
Backbone: HRNetW48-V2
Number of epochs: 30
Dataset: AICrowd small But only obtained 52.0 on AP, while that of the original paper is 75.8. Can anyone explain the reason why?

Jun 20 '23 17:06 minhvu120201dn

We used all the training data containing 280,741 tiles for the final model. The small version was only utilized for ablation studies.

Best

Nan 2023年6月21日 +0800 AM1:54 minhvu120201dn @.***>，写道：

I have tried training the model with:

• RTX 2080 Super GPU with 8GB VRAM • Backbone: HRNetW48-V2 • Number of epochs: 30 • Dataset: AICrowd small But only obtained 52.0 on AP, while that of the original paper is 75.8. Can anyone explain the reason why?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

Jun 20 '23 23:06 cherubicXN

Hi author, I'm using RTX2080TI, 12G, gpu for training. The dataset used: crowdAI 20% of the original size, 60,000 images for training, the Using this network: crowdai-small_hrnet48.yaml But it still prompts me for a video memory overflow. I would like to ask what is the reason for this, is it because the size of my dataset is bigger or because this network of yours is bigger, author?

Jul 25 '23 15:07 zem118

I am not sure what is "a video memory overflow". We never encounter this error message during all experiments. Please make sure that you can run the demo and get predictable results. Then, I suggest that you could try the following changes during training. One is to reduce the batch size, which will reduce the GPU memory usage. Another is to replace the HRNet48 with a smaller version such as HRNet18.

Jul 26 '23 03:07 SarahwXU

Ok, thanks for the reply, I've solved the problem! It works successfully on single GPU. Now my computer is with two GPUs (3080) and I want to train on multiple GPUs, I used the multi-train.py from your model for training, but the run is stuck (that is, he doesn't report an error or continue to run, and it doesn't return information about the training process) I don't know why this is. So I would like to ask you if there are any other additional operations you do when training with multiple GPUs?

Jul 26 '23 12:07 zem118

Hi, Is your CUDA capability compatible with the current PyTorch version?

Jul 26 '23 12:07 XJKunnn

cuda is compatible with pytorch and it has been able to run successfully on a single GPU successfully. It just doesn't run successfully on dual GPUs (both GPUs are idle)

Jul 26 '23 12:07 zem118

Could you please share the log while running the training code?

Jul 26 '23 12:07 XJKunnn

In the terminal, it gets stuck at index created! b855a1fa2358c0759a301aa47b3713a

Jul 26 '23 13:07 zem118

I just run the multi-gpu code and it runs well. Maybe you should check your environment carefully and follow the steps of README file.

Jul 26 '23 13:07 XJKunnn

Okay, thanks for the answer.

Jul 26 '23 13:07 zem118

HiSup HiSup copied to clipboard

Training the model with another setup

HiSup
HiSup copied to clipboard