vision performance degradation in to_pil

🐛 Describe the bug

torchvision.transforms.functional.to_pil_image is much slower when converting torch.float16 image tensors to PIL Images based on my benchmarks (serializing 360 images):

Dependencies:

Python 3.11
Pillow 10.4.0

Before (torch 2.0.1, torchvision v0.15.2, Code here): 23 seconds After ( torch 2.2.0, torchvision v0.17, Code here): 53 seconds

How to reproduce:

import torch
from torchvision.transforms.functional import to_pil_image

rand_img_tensor = torch.rand(3, 512, 512, dtype=torch.float16)
start_time = time.time()
for _ in range(50):
    pil_img = to_pil_image(rand_img_tensor)

end_time = time.time()
print(end_time - start_time) # seconds

Run the above script with both versions of dependencies listed, and the time difference is apparent.

The cause seems to be this PR

Oct 02 '24 08:10 seymurkafkas

Most of the extra time is spent on this line:

    if np.issubdtype(npimg.dtype, np.floating) and mode != "F":
        npimg = (npimg * 255).astype(np.uint8)

I think it's due to multiplication using numpy primitives rather than torch (and also astype instead of torch.Tensor.byte())

Oct 02 '24 08:10 seymurkafkas

Thanks for the report @seymurkafkas .

I think it's due to multiplication using numpy primitives rather than torch (and also astype instead of torch.Tensor.byte())

Ah, if that's the case then the fix might be non-trivial, this it means we'd have to go from a unified numpy logic to a unified pytorch logic. I'm happy to consider a PR if we can keep the code simple enough.

Out of curiosity, why do you need to convert tensors back to PIL, and more specifically, why do you need that part to be fast?

Oct 14 '24 09:10 NicolasHug

Thanks for the report @seymurkafkas .

I think it's due to multiplication using numpy primitives rather than torch (and also astype instead of torch.Tensor.byte())

Ah, if that's the case then the fix might be non-trivial, this it means we'd have to go from a unified numpy logic to a unified pytorch logic. I'm happy to consider a PR if we can keep the code simple enough.

Thanks for the response! I will take a look and submit a PR if possible.

why do you need to convert tensors back to PIL and why do you need that part to be fast?

This is to reduce inference costs for our ML app; less time spent on serialization implies more GPU utilization. We convert to PIL because we use it before serializing to disk.

Oct 25 '24 12:10 seymurkafkas

Thanks for replying! Just so you know and if that's helpful, you may be able to use the encode_jpeg() or encode_png() utilities of torchvision! https://pytorch.org/vision/stable/io.html#image-encoding

Oct 25 '24 12:10 NicolasHug

Thanks for replying! Just so you know and if that's helpful, you may be able to use the encode_jpeg() or encode_png() utilities of torchvision! https://pytorch.org/vision/stable/io.html#image-encoding

Thanks a lot for the tip :) I will experiment with those too.

Oct 25 '24 13:10 seymurkafkas

performance degradation in to_pil_image after v0.17

🐛 Describe the bug