vision decode_jpeg() creates tensors with different stride from PIL.Image.open()

Opening a file with torchvision.io.decode_jpeg() will create a tensor with different strides than opening with PIL and converting to a tensor (see snippet below).

This can be the cause of a very significant difference in training time because of https://github.com/pytorch/pytorch/issues/83840. Let's see if https://github.com/pytorch/pytorch/issues/83840 deserves a "fix" or not but regardless, this might not be the only transform that will be sensitive to strides, and it might be worth changing what we output in decode_jpeg().

@datumbox @vfdev-5 this is something to keep in mind when you'll benchmark the new transforms: the strides matter a ton.

import torch
from torchvision.io import decode_jpeg, read_file
from torchvision.transforms import ToTensor
from PIL import Image

filepath = "./test/assets/encode_jpeg/grace_hopper_517x606.jpg"  # 606 x 517

print(torch.randint(0, 256, (3, 606, 517), dtype=torch.uint8).stride())  # (313302, 517, 1)
print(ToTensor()(Image.open(filepath)).stride())  # (313302, 517, 1))
print(decode_jpeg(read_file(filepath)).stride())  # (1, 1551, 3)  -- this makes Resize() 8X slower because antialias is False by default.

Aug 22 '22 15:08 NicolasHug

it might be worth changing what we output in decode_jpeg()

Perhaps this is something @fmassa has some opinion on?

Aug 22 '22 15:08 NicolasHug

@NicolasHug I wonder if this is also related to https://github.com/pytorch/vision/issues/4880 ? Especially, to https://github.com/pytorch/vision/pull/4898 (this PR was reverted)

Aug 22 '22 15:08 vfdev-5

Yes, this issue seems to explain OP's original question 2.

Aug 22 '22 15:08 NicolasHug

I guess "fixing https://github.com/pytorch/pytorch/issues/83840" seems to go in @fmassa 's suggested direction:

I would revert this PR, and work on improving the performance on the transforms instead

Aug 22 '22 15:08 NicolasHug

I'll close this, a lot has happened since this issue was opened. We should still be concerned about the input / output format of the transforms / decoders, but this issue doesn't provide anything actionable.

Oct 10 '23 08:10 NicolasHug