PyAV icon indicating copy to clipboard operation
PyAV copied to clipboard

UnicodeDecodeError with binary tags in mov files

Open Dobatymo opened this issue 5 years ago • 7 comments

Overview

A UnicodeDecodeError is raised when opening some files.

Expected behavior

It should not fail.

Actual behavior

Traceback:

Traceback (most recent call last):
  File "asd.py", line 153, in __init__
    container = av.open(path, "r")
  File "av\container\core.pyx", line 365, in av.container.core.open
  File "av\container\input.pyx", line 70, in av.container.input.InputContainer.__cinit__
  File "av\utils.pyx", line 31, in av.utils.avdict_to_dict
  File "av\utils.pyx", line 14, in av.utils._decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

Investigation

I noticed the problem when trying to open various .mov files. I think the problem might be the binary data in com.apple.quicktime.artwork tags.

Reproduction

Just call av.open("asd.mov", "r"). I have some example files, but cannot post them publicly.

Versions

  • OS: Windows 7 x64
  • PyAV runtime:
PyAV v7.0.2.dev0
git origin: [email protected]:mikeboers/PyAV
git commit: v6.2.0-132-gd9bebbd
library configuration: --disable-static --enable-shared --enable-gpl --enable-version3 --enable-sdl2 --enable-fontconfig --enable-gnutls --e
nable-iconv --enable-libass --enable-libdav1d --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enabl
e-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-
libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable
-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enabl
e-libxvid --enable-libaom --enable-libmfx --enable-amf --enable-ffnvcodec --enable-cuvid --enable-d3d11va --enable-nvenc --enable-nvdec --en
able-dxva2 --enable-avisynth --enable-libopenmpt
library license: GPL version 3 or later
libavcodec     58. 54.100
libavdevice    58.  8.100
libavfilter     7. 57.100
libavformat    58. 29.100
libavutil      56. 31.100
libswresample   3.  5.100
libswscale      5.  5.100

Research

I have done the following:

Dobatymo avatar Apr 22 '20 06:04 Dobatymo

You're going to have to provide an example file, for which you hold the copyright, otherwise there is nothing we can do here.

jlaine avatar Apr 22 '20 07:04 jlaine

Is there any documentation about binary tags I can read somewhere? I was convinced metadata only held strings, but if that's not the case we will probably need a different approach. One idea might be to allow passing a MetadataEncoder class which would need to implement these methods:

class MetadataEncoder:
    def decode(self, key: str, value: bytes) -> Any:
        ...

    def encode(self, key: str, value: Any) -> bytes:
        ...

The default implementation would be the one we currently have, with encode turning str into bytes and decode doing the opposite.

jlaine avatar Apr 30 '20 12:04 jlaine

There is a table here for apple metadata https://developer.apple.com/library/archive/documentation/QuickTime/QTFF/Metadata/Metadata.html#//apple_ref/doc/uid/TP40000939-CH1-SW43

com.apple.quicktime.artwork seems to be a binary blob and com.apple.quicktime.rating.user is even a BE Float32. So probably the only easy solution would be to return bytes and have the user do all the decoding manually or provide a MetadataEncoder which seems like a good idea.

Otherwise a mapping for tag names to data types would be needed.

The above docs mention value types, which would be great for mapping. But I don't know where they are stored.

EDIT: I managed to create a simple mp4 file with the com.apple.quicktime.artwork tag which shows this error: com.apple.quicktime.artwork.zip

Dobatymo avatar May 28 '20 04:05 Dobatymo

Just a reminder that I attached an example file a while back in case this was missed.

Dobatymo avatar Oct 14 '20 04:10 Dobatymo

Thanks @Dobatymo. Unfortunately, it seems both of us primary maintainers are really distracted with paid work.

mikeboers avatar Oct 14 '20 14:10 mikeboers

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Jul 25 '22 03:07 github-actions[bot]

Just commenting since I don't want this issue to become stale...

Dobatymo avatar Jul 26 '22 06:07 Dobatymo

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Nov 25 '22 02:11 github-actions[bot]

Just commenting since I don't want this issue to become stale...

Dobatymo avatar Nov 25 '22 03:11 Dobatymo

From perusing the code, looks as though metadata_errors may control this. Even though it is documented as only being used for encoding, it's also used for decoding as well. Try: av.open(path, "r", metadata_errors="ignore")

nrhodes avatar Jan 25 '23 22:01 nrhodes

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar May 26 '23 02:05 github-actions[bot]

With av==10.0.0 the file can be opened, but the metadata seems to be simply ignored, even with metadata_errors="strict"

Dobatymo avatar Jun 27 '23 07:06 Dobatymo

From perusing the code, looks as though metadata_errors may control this. Even though it is documented as only being used for encoding, it's also used for decoding as well. Try: av.open(path, "r", metadata_errors="ignore")

thx,that's really helpful

CheshireCC avatar Dec 27 '23 18:12 CheshireCC