BiSeNet
BiSeNet copied to clipboard
Thank you for this implementation
I run the training code and test the performance of BiseNet-v1 and BiseNet-v2. The single mIOU is 75.30% for v1 and 74.18% for v2. As for the FPS, I didn't turn the model into tensorrt but directly use demo.py to test for 1000 iterations. I also remove the auxilary segmentation heads for computational efficiency. Finally, the FPS is 58.32 (768-1536) for v1 and 115 (512-1024) for v2, which is a litter inferior to the papers. Thanks @CoinCheung for your work, which is very enlightening. My implementation environment is as follows: Python 3.7 torh 1.6.0 torchvision 0.7.0 cuda 10.1
Thanks for sharing this !!
Please note that, in order to be compatible with tensorrt 7.0, I replaced interpolation with pixel-shuffle operation, which requires the previous conv layers to have more filters. This would bring more parameter and slow down the model a little bit. If python satisfies you, you can remove these pixel-shuffles and use interpolate back, which in theory would make the model more lightweighted.
Thank you! I have noticed that! But recently I found another problem that I can't specify GPUs, which is very weird. For example, when I run CUDA_VISIBLE_DEVICES=4,5 python -m torch.distributed.launch --nproc_per_node=2 tools/train.py --model bisenetv2, the program still runs on GPU 6,7. I also tried setting os.environ["CUDA_VISIBLE_DEVICES"] = '4,5', but the problem still exists. This problem has bothered me for two days. I don't really understand the working mechanism of torch.distributed.launch, so please advise me if you see any problems. Thanks!
It is an import problem. I have solved it.
Good to know that your solved your problem, I left this open so that other people can see your performance test result.
I run the training code and test the performance of BiseNet-v1 and BiseNet-v2. The single mIOU is 75.30% for v1 and 74.18% for v2. As for the FPS, I didn't turn the model into tensorrt but directly use demo.py to test for 1000 iterations. I also remove the auxilary segmentation heads for computational efficiency. Finally, the FPS is 58.32 (768-1536) for v1 and 115 (512-1024) for v2, which is a litter inferior to the papers. Thanks @CoinCheung for your work, which is very enlightening. My implementation environment is as follows: Python 3.7 torh 1.6.0 torchvision 0.7.0 cuda 10.1
May I ask which GPU you use to get "115 (512-1024) for v2 "?