MONAILabel Monai label should take advantage of all gpus while running inference

**Is your feature request related to a problem? ** When we run monai label with 1 model and 4 gpus. It should be the servers responsibility to determine with gpu to use. Currently the user in his request specifies which gpu to use. in my mind the user can be a radiologist using ohif interface who has no clue if there is even any gpus in the back

Describe the solution you'd like Monai label would start with a parameter of gpu=0,2 or all by default Monai label should spin up workers as inference request come so each worker runs inference on different gpu to make use of all gpus Monai label should also handle the gpus for batch inference request ( this is new feature in 0.7) but I think it uses all gpus. Some how monai label needs to manage the gpus available to it in a smart way

Describe alternatives you've considered start monai label with --worker flag that is dependent on the number of gpus. --> this creates multiple process that are have consciences PIDs in my app I added

class SAMMPF(BasicInferTask):
    """
    This provides Inference Engine for pre-trained spleen segmentation (UNet) model over MSD Dataset. 
    """
    def __call__(
        self, request, callbacks = None
    ) -> Union[Dict, Tuple[str, Dict[str, Any]]]:
        request["device"]="cuda:"+self.gpu_no
        return super().__call__(request, callbacks)
        
    def __init__(
        self,path,network=None,
        target_spacing=(1.5, 1.5, 1.5),type=InferType.SEGMENTATION,
        labels=None,dimension=3,
        description="A test",
        **kwargs,
    ):
        super().__init__(path=path,network=network,type=type,labels=labels,
dimension=dimension,description=description,**kwargs,)
        
        import os
        NUMBER_OF_WORKERS=4 <-- set as the number of gpus which is equal the number of workers 
        pid=os.getpid()
        self.gpu_no=str(pid%NUMBER_OF_WORKERS)

    def pre_transforms(self, data=None) -> Sequence[Callable]:
        return [
            LoadImaged(keys="image"),
              ....
            EnsureTyped(keys="image",device='cuda:'+self.gpu_no),
        ]

Additional context For Batch inference new in 0.7 we do start workers and loop use all the gpus. This is the great and is the correct behavior until inference requests starts coming in.

For example

say the batch infer for 1k patients will take 10 hours using all 4 gpus.
now if 4 users starting sending infer requests for the same model on new patients will they wait for 10 hours till the batch infer finishes? you could say the batch infer with use all gpu -1, but what if no user sends any requests then you have wasted that last gpu
some how monai label need to be able to scale down the works for batch inference

Another option would be to return a descriptive response of server is busy doing batch work, try again in X hours

Jul 10 '23 18:07 AHarouni

This problem needs an enterprise grade solution.. where each API needs to scale independently

However for single inference we can make use of possible gpus if available.. that fix shall be added soon

Jul 10 '23 20:07 SachidanandAlle

这个问题需要一个企业级的解决方案。每个 API 都需要独立扩展

但是，对于单一推理，如果可用，我们可以利用可能的 gpu。该修复程序将很快添加

Hello, have you solved the problem now?

Aug 01 '24 03:08 E6B3BD

Hi This issue has been fixed for a while, I just forgot to close it. For batch inference monai label would open multiple threads each using a GPU and running inference

Aug 12 '24 15:08 AHarouni

嗨，这个问题已经解决了一段时间，我只是忘了关闭它。对于批量推理，monai label 将打开多个线程，每个线程都使用 GPU 并运行推理 Hello, how did you solve it? I have 2*4090 unable to implement parallel reasoning I hope I can get your help. Thank you

Aug 13 '24 04:08 E6B3BD

The logic to use multiple gpus available is here https://github.com/Project-MONAI/MONAILabel/blob/main/monailabel/interfaces/tasks/batch_infer.py#L96-L113 This was released in 0.7 I think

You should be able to trigger batch infer for all images/ unlabled images and in your request set max_workers to the number of gpus you have.

Aug 13 '24 17:08 AHarouni