decode_jpeg() creates tensors with different stride from PIL.Image.open()
Opening a file with torchvision.io.decode_jpeg() will create a tensor with different strides than opening with PIL and converting to a tensor (see snippet below).
This can be the cause of a very significant difference in training time because of https://github.com/pytorch/pytorch/issues/83840. Let's see if https://github.com/pytorch/pytorch/issues/83840 deserves a "fix" or not but regardless, this might not be the only transform that will be sensitive to strides, and it might be worth changing what we output in decode_jpeg().
@datumbox @vfdev-5 this is something to keep in mind when you'll benchmark the new transforms: the strides matter a ton.
import torch
from torchvision.io import decode_jpeg, read_file
from torchvision.transforms import ToTensor
from PIL import Image
filepath = "./test/assets/encode_jpeg/grace_hopper_517x606.jpg" # 606 x 517
print(torch.randint(0, 256, (3, 606, 517), dtype=torch.uint8).stride()) # (313302, 517, 1)
print(ToTensor()(Image.open(filepath)).stride()) # (313302, 517, 1))
print(decode_jpeg(read_file(filepath)).stride()) # (1, 1551, 3) -- this makes Resize() 8X slower because antialias is False by default.
it might be worth changing what we output in decode_jpeg()
Perhaps this is something @fmassa has some opinion on?
@NicolasHug I wonder if this is also related to https://github.com/pytorch/vision/issues/4880 ? Especially, to https://github.com/pytorch/vision/pull/4898 (this PR was reverted)
Yes, this issue seems to explain OP's original question 2.
I guess "fixing https://github.com/pytorch/pytorch/issues/83840" seems to go in @fmassa 's suggested direction:
I would revert this PR, and work on improving the performance on the transforms instead
I'll close this, a lot has happened since this issue was opened. We should still be concerned about the input / output format of the transforms / decoders, but this issue doesn't provide anything actionable.