IOPaint gpu inference speed is slower than the cpu

CPU inference ： | INFO | main:process: 67-process time: 2742.7728176116943ms GPU inference ： | INFO | main:process: 67-process time: 5317.4756874159873ms

I use the big-lama model for inference My cpu inference speed is about 2000+ms But using gpu inference to reach 5000+ms Is that normal?

Jan 25 '24 01:01 yumianhuli1

Abnormal, the gpu will run a little slower for the first time, but it is definitely faster than the cpu, what is your graphics card?

Jan 27 '24 09:01 Sanster

Abnormal, the gpu will run a little slower for the first time, but it is definitely faster than the cpu, what is your graphics card?

GTX 3080Ti, and I only printed the inference time, not included the time to load the model

Jan 30 '24 06:01 yumianhuli1

What is your testing script like? Have you installed the CUDA version of PyTorch correctly?

Feb 01 '24 00:02 Sanster

What is your testing script like? Have you installed the CUDA version of PyTorch correctly?

1、 Testing script core processing logic is as below： def process(self, origin_image_bytes, mask_bytes, form): image, alpha_channel = load_img(origin_image_bytes) original_shape = image.shape interpolation = cv2.INTER_CUBIC size_limit: Union[int, str] = form.get("sizeLimit", "1080") if size_limit == "Original": size_limit = max(image.shape) else: size_limit = int(size_limit) config = Config( ldm_steps=form["ldmSteps"], hd_strategy=form["hdStrategy"], hd_strategy_crop_margin=form["hdStrategyCropMargin"], hd_strategy_crop_trigger_size=form["hdStrategyCropTrigerSize"], hd_strategy_resize_limit=form["hdStrategyResizeLimit"], ) image = resize_max_size(image, size_limit=size_limit, interpolation=interpolation) mask, _ = load_img(mask_bytes, gray=True) mask = resize_max_size(mask, size_limit=size_limit, interpolation=interpolation) start = time.time() res_np_img = self.model(image, mask, config) logger.info(f"process time: {(time.time() - start) * 1000}ms") torch.cuda.empty_cache() if alpha_channel is not None: if alpha_channel.shape[:2] != res_np_img.shape[:2]: alpha_channel = cv2.resize( alpha_channel, dsize=(res_np_img.shape[1], res_np_img.shape[0]) ) res_np_img = np.concatenate( (res_np_img, alpha_channel[:, :, np.newaxis]), axis=-1 ) ext = self.get_image_ext(origin_image_bytes) return numpy_to_bytes(res_np_img, ext)

form = {
    "ldmSteps": "25",
    "hdStrategy": "Crop",
    "hdStrategyCropMargin": "128",
    "hdStrategyCropTrigerSize": "512",
    "hdStrategyResizeLimit": "768"
}

#############################################################
model_name = "lama" device = "cuda"

2、 Name: torch Version: 2.1.1+cu121 Python：3.11.5

I use the latest lama-cleaner, cuda mode, the infer speed is normal, almost half of the cpu, the code above and the latest lama-cleaner, what is the difference that causes the code above let cuda inference speed to slow down？ thank you

Feb 03 '24 16:02 yumianhuli1

I seem to have encountered this problem before. You can try adding these lines of code at the beginning of the file.

https://github.com/Sanster/IOPaint/blob/a26bf7a0c9dc0f8faad77aa2591793ad20724831/iopaint/api.py#L14

import torch
try:
    torch._C._jit_override_can_fuse_on_cpu(False)
    torch._C._jit_override_can_fuse_on_gpu(False)
    torch._C._jit_set_texpr_fuser_enabled(False)
    torch._C._jit_set_nvfuser_enabled(False)
except:
    pass

Feb 07 '24 14:02 Sanster

I seem to have encountered this problem before. You can try adding these lines of code at the beginning of the file.

https://github.com/Sanster/IOPaint/blob/a26bf7a0c9dc0f8faad77aa2591793ad20724831/iopaint/api.py#L14
import torch
try:
    torch._C._jit_override_can_fuse_on_cpu(False)
    torch._C._jit_override_can_fuse_on_gpu(False)
    torch._C._jit_set_texpr_fuser_enabled(False)
    torch._C._jit_set_nvfuser_enabled(False)
except:
    pass

thanks！

Feb 14 '24 08:02 yumianhuli1

IOPaint IOPaint copied to clipboard

gpu inference speed is slower than the cpu

IOPaint
IOPaint copied to clipboard