PyAV
PyAV copied to clipboard
UnicodeDecodeError with binary tags in mov files
Overview
A UnicodeDecodeError is raised when opening some files.
Expected behavior
It should not fail.
Actual behavior
Traceback:
Traceback (most recent call last):
File "asd.py", line 153, in __init__
container = av.open(path, "r")
File "av\container\core.pyx", line 365, in av.container.core.open
File "av\container\input.pyx", line 70, in av.container.input.InputContainer.__cinit__
File "av\utils.pyx", line 31, in av.utils.avdict_to_dict
File "av\utils.pyx", line 14, in av.utils._decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
Investigation
I noticed the problem when trying to open various .mov files. I think the problem might be the binary data in com.apple.quicktime.artwork tags.
Reproduction
Just call av.open("asd.mov", "r").
I have some example files, but cannot post them publicly.
Versions
- OS: Windows 7 x64
- PyAV runtime:
PyAV v7.0.2.dev0
git origin: [email protected]:mikeboers/PyAV
git commit: v6.2.0-132-gd9bebbd
library configuration: --disable-static --enable-shared --enable-gpl --enable-version3 --enable-sdl2 --enable-fontconfig --enable-gnutls --e
nable-iconv --enable-libass --enable-libdav1d --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enabl
e-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-
libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable
-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enabl
e-libxvid --enable-libaom --enable-libmfx --enable-amf --enable-ffnvcodec --enable-cuvid --enable-d3d11va --enable-nvenc --enable-nvdec --en
able-dxva2 --enable-avisynth --enable-libopenmpt
library license: GPL version 3 or later
libavcodec 58. 54.100
libavdevice 58. 8.100
libavfilter 7. 57.100
libavformat 58. 29.100
libavutil 56. 31.100
libswresample 3. 5.100
libswscale 5. 5.100
Research
I have done the following:
- [x] Checked the PyAV documentation
- [x] Searched on Google
- [x] Searched on Stack Overflow
- [x] Looked through old GitHub issues
- [ ] Asked on PyAV Gitter
- [ ] ... and waited 72 hours for a response.
You're going to have to provide an example file, for which you hold the copyright, otherwise there is nothing we can do here.
Is there any documentation about binary tags I can read somewhere? I was convinced metadata only held strings, but if that's not the case we will probably need a different approach. One idea might be to allow passing a MetadataEncoder class which would need to implement these methods:
class MetadataEncoder:
def decode(self, key: str, value: bytes) -> Any:
...
def encode(self, key: str, value: Any) -> bytes:
...
The default implementation would be the one we currently have, with encode turning str into bytes and decode doing the opposite.
There is a table here for apple metadata https://developer.apple.com/library/archive/documentation/QuickTime/QTFF/Metadata/Metadata.html#//apple_ref/doc/uid/TP40000939-CH1-SW43
com.apple.quicktime.artwork seems to be a binary blob and com.apple.quicktime.rating.user is even a BE Float32. So probably the only easy solution would be to return bytes and have the user do all the decoding manually or provide a MetadataEncoder which seems like a good idea.
Otherwise a mapping for tag names to data types would be needed.
The above docs mention value types, which would be great for mapping. But I don't know where they are stored.
EDIT: I managed to create a simple mp4 file with the com.apple.quicktime.artwork tag which shows this error: com.apple.quicktime.artwork.zip
Just a reminder that I attached an example file a while back in case this was missed.
Thanks @Dobatymo. Unfortunately, it seems both of us primary maintainers are really distracted with paid work.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Just commenting since I don't want this issue to become stale...
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Just commenting since I don't want this issue to become stale...
From perusing the code, looks as though metadata_errors may control this. Even though it is documented as only being used for encoding, it's also used for decoding as well.
Try:
av.open(path, "r", metadata_errors="ignore")
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
With av==10.0.0 the file can be opened, but the metadata seems to be simply ignored, even with metadata_errors="strict"
From perusing the code, looks as though
metadata_errorsmay control this. Even though it is documented as only being used for encoding, it's also used for decoding as well. Try:av.open(path, "r", metadata_errors="ignore")
thx,that's really helpful