HiSup
HiSup copied to clipboard
Training the model with another setup
I have tried training the model with:
- RTX 2080 Super GPU with 8GB VRAM
- Backbone: HRNetW48-V2
- Number of epochs: 30
- Dataset: AICrowd small But only obtained 52.0 on AP, while that of the original paper is 75.8. Can anyone explain the reason why?
We used all the training data containing 280,741 tiles for the final model. The small version was only utilized for ablation studies.
Best
Nan 2023年6月21日 +0800 AM1:54 minhvu120201dn @.***>,写道:
I have tried training the model with:
• RTX 2080 Super GPU with 8GB VRAM • Backbone: HRNetW48-V2 • Number of epochs: 30 • Dataset: AICrowd small But only obtained 52.0 on AP, while that of the original paper is 75.8. Can anyone explain the reason why?
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hi author, I'm using RTX2080TI, 12G, gpu for training. The dataset used: crowdAI 20% of the original size, 60,000 images for training, the Using this network: crowdai-small_hrnet48.yaml But it still prompts me for a video memory overflow. I would like to ask what is the reason for this, is it because the size of my dataset is bigger or because this network of yours is bigger, author?
I am not sure what is "a video memory overflow". We never encounter this error message during all experiments. Please make sure that you can run the demo and get predictable results. Then, I suggest that you could try the following changes during training. One is to reduce the batch size, which will reduce the GPU memory usage. Another is to replace the HRNet48 with a smaller version such as HRNet18.
Ok, thanks for the reply, I've solved the problem! It works successfully on single GPU. Now my computer is with two GPUs (3080) and I want to train on multiple GPUs, I used the multi-train.py from your model for training, but the run is stuck (that is, he doesn't report an error or continue to run, and it doesn't return information about the training process) I don't know why this is. So I would like to ask you if there are any other additional operations you do when training with multiple GPUs?
Hi, Is your CUDA capability compatible with the current PyTorch version?
cuda is compatible with pytorch and it has been able to run successfully on a single GPU successfully. It just doesn't run successfully on dual GPUs (both GPUs are idle)
Could you please share the log while running the training code?
In the terminal, it gets stuck at index created!
I just run the multi-gpu code and it runs well.
Maybe you should check your environment carefully and follow the steps of README file.
Okay, thanks for the answer.