Broken reading for some GIFs
🐛 Describe the bug
I have encountered an issue where using torchvision.io.read_image (as well as decode_image and decode_gif) on certain GIF files produces an invalid output. Here is a code:
import cv2
from torchvision.io import read_image
gif = read_image('sample.gif').numpy().transpose(0, 2, 3, 1)
image = gif[15] # 15-th frame
image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
cv2.imshow('read_torch', image)
cv2.waitKey()
Sample of buggy GIF:
Its source: https://i.pinimg.com/originals/a0/02/a3/a002a3e51a2adc85d6c0a4684892e743.gif
Output:
Comparing with PIL (Pillow)
I understand that this may be an underlying issue of GIFLIB and will not be fixed. However, since the behavior is unexpected and could impact users, it might be beneficial to include a note in the docs.
Versions
I found this behavior on Macbook and Linux server. They have different setup, but both of them use the latest versions:
torch==2.6.0
torchvision==0.21.0
Hey @abionics, Thanks for reporting the issue, and apologies for the delayed response.
After investigating, I suspect this behavior is due to how GIF animations are optimized. Many GIFs store only the parts of the image that change between frames. These incremental frames are then layered over the previous ones during playback. As a result, to accurately display any given frame, you need to reconstruct it by compositing it with all the preceding frames. This Stack Overflow thread provides a good overview: Incomplete images from decoded GIF.
In our case, I suspect the issue is that we're not properly reconstructing the full frame before displaying it. Indeed, the first frame renders correctly as it doesn’t rely on any prior frames. Does my analysis makes sense to you?
import cv2
from torchvision.io import read_image
cv2.imshow('read_torch', image)
gif = read_image('sample.gif').numpy().transpose(0, 2, 3, 1)
image = gif[0] # 1-st frame
image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
cv2.imshow(image)
Hi @AntoineSimoulin, thank you for detailed response and investigation!
It makes sense and it looks like a real reason. Additionally, I uploaded this GIF to online splitter and tried different modes, here is a link. The "Redraw every frame with details from previous frames" mode gives a correct result, while "Ignore optimizations" mode shows only the changed parts (as read_image does)
I think I understand this one, it's to do with the DisposalMode that controls how to draw subsequent frames.
The gif that's not displaying correctly has DisposalMode == DISPOSAL_UNSPECIFIED. PyTorch Vision is defaulting to the "Background" drawing mode for this type of gif, while the majority of browsers and libraries default to DISPOSE_DO_NOT.
So if you wanted to match the behaviour of Pillow then it'd be doing the equivalent of this
https://github.com/python-pillow/Pillow/blob/6b4bb79b44b3bde6a25a33b6733358d409930854/src/PIL/GifImagePlugin.py#L366
With this change to decode_gif.cpp it displays perfectly
Before
https://github.com/pytorch/vision/blob/3c5a9afb2f0a8ba0850247697384d8e585a25ebb/torchvision/csrc/io/image/cpu/decode_gif.cpp#L129-L140
After
if (i > 0 && (gcb.DisposalMode == DISPOSAL_UNSPECIFIED ||
gcb.DisposalMode == DISPOSE_DO_NOT ||
gcb.DisposalMode == DISPOSE_PREVIOUS)) {
out[i] = out[i - 1];
}
Would be interested in doing a PR for this if you're happy with the analysis.
Thanks for looking into this @sg3-141-592 ! I remember not being so sure what mode to choose here, when writing the GIF decoder. We'd love to review a PR, if our existing tests are still passing with your changes then this is definitely the right fix. Thank you!
@sg3-141-592 thank you for analysis! It aligns with the reason that @AntoineSimoulin mentioned. I think your fix should resolve this issue
Just merged the fix in #9241. Thanks @sg3-141-592 for submitting the PR and @abionics for opening and discussing the issue!