IOPaint
IOPaint copied to clipboard
gpu inference speed is slower than the cpu
CPU inference : | INFO | main:process: 67-process time: 2742.7728176116943ms GPU inference : | INFO | main:process: 67-process time: 5317.4756874159873ms
I use the big-lama model for inference My cpu inference speed is about 2000+ms But using gpu inference to reach 5000+ms Is that normal?
Abnormal, the gpu will run a little slower for the first time, but it is definitely faster than the cpu, what is your graphics card?
Abnormal, the gpu will run a little slower for the first time, but it is definitely faster than the cpu, what is your graphics card?
GTX 3080Ti, and I only printed the inference time, not included the time to load the model
What is your testing script like? Have you installed the CUDA version of PyTorch correctly?
What is your testing script like? Have you installed the CUDA version of PyTorch correctly?
1、 Testing script core processing logic is as below: def process(self, origin_image_bytes, mask_bytes, form): image, alpha_channel = load_img(origin_image_bytes) original_shape = image.shape interpolation = cv2.INTER_CUBIC size_limit: Union[int, str] = form.get("sizeLimit", "1080") if size_limit == "Original": size_limit = max(image.shape) else: size_limit = int(size_limit) config = Config( ldm_steps=form["ldmSteps"], hd_strategy=form["hdStrategy"], hd_strategy_crop_margin=form["hdStrategyCropMargin"], hd_strategy_crop_trigger_size=form["hdStrategyCropTrigerSize"], hd_strategy_resize_limit=form["hdStrategyResizeLimit"], ) image = resize_max_size(image, size_limit=size_limit, interpolation=interpolation) mask, _ = load_img(mask_bytes, gray=True) mask = resize_max_size(mask, size_limit=size_limit, interpolation=interpolation) start = time.time() res_np_img = self.model(image, mask, config) logger.info(f"process time: {(time.time() - start) * 1000}ms") torch.cuda.empty_cache() if alpha_channel is not None: if alpha_channel.shape[:2] != res_np_img.shape[:2]: alpha_channel = cv2.resize( alpha_channel, dsize=(res_np_img.shape[1], res_np_img.shape[0]) ) res_np_img = np.concatenate( (res_np_img, alpha_channel[:, :, np.newaxis]), axis=-1 ) ext = self.get_image_ext(origin_image_bytes) return numpy_to_bytes(res_np_img, ext)
form = {
"ldmSteps": "25",
"hdStrategy": "Crop",
"hdStrategyCropMargin": "128",
"hdStrategyCropTrigerSize": "512",
"hdStrategyResizeLimit": "768"
}
#############################################################
model_name = "lama"
device = "cuda"
2、 Name: torch Version: 2.1.1+cu121 Python:3.11.5
I use the latest lama-cleaner, cuda mode, the infer speed is normal, almost half of the cpu, the code above and the latest lama-cleaner, what is the difference that causes the code above let cuda inference speed to slow down? thank you
I seem to have encountered this problem before. You can try adding these lines of code at the beginning of the file.
https://github.com/Sanster/IOPaint/blob/a26bf7a0c9dc0f8faad77aa2591793ad20724831/iopaint/api.py#L14
import torch
try:
torch._C._jit_override_can_fuse_on_cpu(False)
torch._C._jit_override_can_fuse_on_gpu(False)
torch._C._jit_set_texpr_fuser_enabled(False)
torch._C._jit_set_nvfuser_enabled(False)
except:
pass
I seem to have encountered this problem before. You can try adding these lines of code at the beginning of the file.
https://github.com/Sanster/IOPaint/blob/a26bf7a0c9dc0f8faad77aa2591793ad20724831/iopaint/api.py#L14
import torch try: torch._C._jit_override_can_fuse_on_cpu(False) torch._C._jit_override_can_fuse_on_gpu(False) torch._C._jit_set_texpr_fuser_enabled(False) torch._C._jit_set_nvfuser_enabled(False) except: pass
thanks!