performance degradation in to_pil_image after v0.17
🐛 Describe the bug
torchvision.transforms.functional.to_pil_image is much slower when converting torch.float16 image tensors to PIL Images based on my benchmarks (serializing 360 images):
Dependencies:
Python 3.11
Pillow 10.4.0
Before (torch 2.0.1, torchvision v0.15.2, Code here): 23 seconds After ( torch 2.2.0, torchvision v0.17, Code here): 53 seconds
How to reproduce:
import torch
from torchvision.transforms.functional import to_pil_image
rand_img_tensor = torch.rand(3, 512, 512, dtype=torch.float16)
start_time = time.time()
for _ in range(50):
pil_img = to_pil_image(rand_img_tensor)
end_time = time.time()
print(end_time - start_time) # seconds
Run the above script with both versions of dependencies listed, and the time difference is apparent.
The cause seems to be this PR
Most of the extra time is spent on this line:
if np.issubdtype(npimg.dtype, np.floating) and mode != "F":
npimg = (npimg * 255).astype(np.uint8)
I think it's due to multiplication using numpy primitives rather than torch (and also astype instead of torch.Tensor.byte())
Thanks for the report @seymurkafkas .
I think it's due to multiplication using numpy primitives rather than torch (and also astype instead of torch.Tensor.byte())
Ah, if that's the case then the fix might be non-trivial, this it means we'd have to go from a unified numpy logic to a unified pytorch logic. I'm happy to consider a PR if we can keep the code simple enough.
Out of curiosity, why do you need to convert tensors back to PIL, and more specifically, why do you need that part to be fast?
Thanks for the report @seymurkafkas .
I think it's due to multiplication using numpy primitives rather than torch (and also astype instead of torch.Tensor.byte())
Ah, if that's the case then the fix might be non-trivial, this it means we'd have to go from a unified numpy logic to a unified pytorch logic. I'm happy to consider a PR if we can keep the code simple enough.
Thanks for the response! I will take a look and submit a PR if possible.
why do you need to convert tensors back to PIL and why do you need that part to be fast?
This is to reduce inference costs for our ML app; less time spent on serialization implies more GPU utilization. We convert to PIL because we use it before serializing to disk.
Thanks for replying! Just so you know and if that's helpful, you may be able to use the encode_jpeg() or encode_png() utilities of torchvision! https://pytorch.org/vision/stable/io.html#image-encoding
Thanks for replying! Just so you know and if that's helpful, you may be able to use the
encode_jpeg()orencode_png()utilities of torchvision! https://pytorch.org/vision/stable/io.html#image-encoding
Thanks a lot for the tip :) I will experiment with those too.