mseg-semantic MSeg, universal demo takes 10 minute for 1 image, why so?

Hallo Pros,

i am currently working with enhancing image enhancement paper and algorithm and trying to implement that. In the process, we need to use MSeg-segmentation for real and rendered images/ datasets. i have like 50-60k images.

So the dependencies MSeg-api and MSeg_semantic were already installed. I tried the google collab first and then copying the commands, so i could run the script in my linux also. the command is like this: python -u mseg_semantic/tool/universal_demo.py
--config="default_config_360.yaml"
model_name mseg-3m
model_path mseg-3m.pth
input_file /home/luda1013/PfD/image/try_images

the weight i used, i downloaded it from the google collab, so the mseg-3m-1080.pth MSeg-log

but for me, it took like 10 minutes for 1 image and also what i get in temp_files is just the gray scale image of it. Could someone help me how i could solve this problem, thank you :)

Apr 19 '23 09:04 luda1013

My setup:

Ubuntu 20.04.6 LTS
core i9-10980XE @ 3GHz
graphic (nvidia-smi): NVIDIA RTX A5000 (48 GB)
Cuda 11.7
Pytorch version : 2.0.0
Cuda at: /usr/local/cuda*

Apr 19 '23 10:04 luda1013

Hi @luda1013, thanks for your interest in our MSeg models. It sounds like your GPU is not being utilized or recognized in your Pytorch installation. Can you please verify the following:

Can you run nvidia-smi during inference to confirm that the GPU is actually being detected by Pytorch? Several GB of GPU RAM should be shown as being utilized when you start inference.
Have you tried running one of your images in our Colab listed on our readme? There, we run the mseg-3m-1080p.pth model with config "mseg-semantic/mseg_semantic/config/test/default_config_360_ms.yaml", from https://github.com/mseg-dataset/mseg-semantic/releases/download/v0.1/mseg-3m-1080p.pth, and inference takes less than 10 sec on each of the demo images, even on an old Tesla T4 GPU provided for free on Colab.
What resolution are your input images? The demo images in the colab have resolution (1080, 1920, 3), (1080, 1728, 3),(1080, 1920, 3), and (500, 990, 3) pixels.

Jan 06 '24 16:01 mseg-dataset