exif-py icon indicating copy to clipboard operation
exif-py copied to clipboard

Fail for WebP with EXIF at the end of the image file

Open mvaranda opened this issue 5 years ago • 2 comments

The current _find_webp_exif expects "EXIF" to be located in a hard coded position at the beginning of the file (right after VP8X).

However, WebP metadata can be located at the end of the file. The attached file, metadata_android.webp, the EXIF is located at offset 0x00148540

metadata_android.webp

Logs:

(base) OTT-MarceloMac:sample_photos mvaranda$ EXIF.py --debug metadata_android.webp 
INFO   Opening: metadata_android.webp
DEBUG  WebP format recognized in data[0:4], data[8:12]
Traceback (most recent call last):
  File "/Users/mvaranda/opt/anaconda3/bin/EXIF.py", line 129, in <module>
    main()
  File "/Users/mvaranda/opt/anaconda3/bin/EXIF.py", line 96, in main
img_file, stop_tag=args.stop_tag, details=args.detailed, strict=args.strict, debug=args.debug
  File "/Users/mvaranda/opt/anaconda3/lib/python3.7/site-packages/exifread/__init__.py", line 266, in process_file
    }[endian])
KeyError: 'E'

Beginning of the file:

(base) OTT-MarceloMac:sample_photos mvaranda$ xxd -g1 -l256  metadata_android.webp

00000000: 52 49 46 46 14 fa 47 00 57 45 42 50 56 50 38 58  RIFF..G.WEBPVP8X
00000010: 0a 00 00 00 08 00 00 00 0f 12 00 2f 0a 00 56 50  .........../..VP
00000020: 38 20 3a e0 47 00 b0 ab 74 9d 01 2a 10 12 30 0a  8 :.G...t..*..0.
00000030: 3e 0d 04 82 41 01 34 00 00 18 96 76 19 2a 9f c6  >...A.4....v.*..
00000040: f0 14 f1 d7 e5 5f 64 fe 72 fa 37 fa 2f d9 5f f3  ....._d.r.7./._.
00000050: f9 10 76 7f f7 1f ff 3f e4 ff d2 fd ea f9 05 fd  ..v....?........
00000060: 57 fd 0f ff bf f7 7f f2 fd 34 7e f7 ef 1f fe 0e  W........4~.....
00000070: 93 2e 1f fe cf ef 0f b8 1f d7 5f b9 3f bf fe ca  .........._.?...
00000080: ff af fb bb ff 73 d4 63 ef bf f4 7f ff 7f dc ff  .....s.c........
00000090: c7 f2 03 fd af fd e7 ff ef fa 3f f6 3d eb 3f c5  ..........?.=.?.
000000a0: fb d5 ff a7 d4 63 df 3f ea fe f1 ff e4 f8 03 fa  .....c.?........
000000b0: bf f7 07 fe 77 ff ff 8f bf f1 7d f8 7f ff ff ff  ....w.....}.....
000000c0: ed f7 f7 7f fb 7f 7d 5f ff ff ff fd 16 fe d5 ff  ......}_........
000000d0: 23 f7 67 f7 ff e4 03 ff 9f ff ff fe bf fd fe 6f  #.g............o
000000e0: 3f 80 7f ee ff ff ff b7 fe f7 cc d7 f0 0f fb ff  ?...............
000000f0: ff ff ed 69 92 c4 1b 27 40 17 99 3f 63 e2 15 70  ...i...'@..?c..p
(base) OTT-MarceloMac:sample_photos mvaranda$ 

Tail:

00148510: a466 511c 086e 7ac4 3209 1c2c 47fd ea4c  .fQ..nz.2..,G..L
00148520: 223c 2fa9 e1e4 d5cc 686c 7c34 836f 3ee9  "</.....hl|4.o>.
00148530: 3646 ef50 4b88 bae3 e8f0 01ea 5958 6000  6F.PK.......YX`.
00148540: 4558 4946 b419 0000 4578 6966 0000 4d4d  EXIF....Exif..MM
00148550: 002a 0000 0008 000b 010f 0002 0000 0005  .*..............
00148560: 0000 0092 0110 0002 0000 0006 0000 0098  ................
00148570: 0112 0003 0000 0001 0001 0000 011a 0005  ................
00148580: 0000 0001 0000 009e 011b 0005 0000 0001  ................
00148590: 0000 00a6 0128 0003 0000 0001 0002 0000  .....(..........
001485a0: 0131 0002 0000 0031 0000 00ae 0132 0002  .1.....1.....2..
001485b0: 0000 0014 0000 00e0 0213 0003 0000 0001  ................
001485c0: 0001 0000 8769 0004 0000 0001 0000 00f4  .....i..........
001485d0: 8825 0004 0000 0001 0000 02f2 0000 03cc  .%..............
001485e0: 536f 6e79 0000 4833 3132 3300 0000 0048  Sony..H3123....H
001485f0: 0000 0001 0000 0048 0000 0001 4833 3132  .......H....H312
00148600: 332d 7573 6572 2039 2035 302e 322e 412e  3-user 9 50.2.A.
00148610: 332e 3737 2032 3132 3231 3434 3334 3620  3.77 2122144346 
00148620: 7265 6c65 6173 652d 6b65 7973 0000 3230  release-keys..20
00148630: 3230 3a31 313a 3036 2031 323a 3231 3a32  20:11:06 12:21:2
00148640: 3800 001d 829a 0005 0000 0001 0000 0256  8..............V
00148650: 829d 0005 0000 0001 0000 025e 8822 0003  ...........^."..
00148660: 0000 0001 0000 0000 8827 0003 0000 0001  .........'......
00148670: 0032 0000 9000 0007 0000 0004 3032 3230  .2..........0220
00148680: 9003 0002 0000 0014 0000 0266 9004 0002  ...........f....
00148690: 0000 0014 0000 027a 9101 0007 0000 0004  .......z........
001486a0: 0102 0300 9201 000a 0000 0001 0000 028e  ................
001486b0: 9202 0005 0000 0001 0000 0296 9203 000a  ................
001486c0: 0000 0001 0000 029e 9204 000a 0000 0001  ................
001486d0: 0000 02a6 9207 0003 0000 0001 0002 0000  ................
001486e0: 9209 0003 0000 0001 0010 0000 920a 0005  ................
001486f0: 0000 0001 0000 02ae 9290 0002 0000 0007  ................
00148700: 0000 02b6 9291 0002 0000 0007 0000 02be  ................
00148710: 9292 0002 0000 0007 0000 02c6 a000 0007  ................
00148720: 0000 0004 3031 3030 a001 0003 0000 0001  ....0100........
00148730: 0001 0000 a002 0004 0000 0001 0000 1210  ................
00148740: a003 0004 0000 0001 0000 0a30 a005 0004  ...........0....
00148750: 0000 0001 0000 02d3 a217 0003 0000 0001  ................
00148760: 0000 0000 a301 0002 0000 0005 0000 02ce  ................
00148770: a402 0003 0000 0001 0000 0000 a403 0003  ................
00148780: 0000 0001 0000 0000 a405 0003 0000 0001  ................
00148790: 0000 0000 a406 0003 0000 0001 0000 0000  ................
001487a0: 0000 0000 0000 0001 0000 02f4 0000 00c8  ................
001487b0: 0000 0064 3230 3230 3a31 313a 3036 2031  ...d2020:11:06 1
001487c0: 323a 3231 3a32 3800 3230 3230 3a31 313a  2:21:28.2020:11:
001487d0: 3036 2031 323a 3231 3a32 3800 0000 255a  06 12:21:28...%Z
001487e0: 0000 03e8 0000 00c8 0000 0064 0000 0000  ...........d....
001487f0: 0000 0064 0000 0000 0000 0006 0000 107c  ...d...........|
00148800: 0000 03e8 3336 3830 3339 0000 3336 3830  ....368039..3680
00148810: 3339 0000 3336 3830 3339 0000 3130 3030  39..368039..1000
00148820: 0000 0200 0100 0200 0000 0452 3938 0000  ...........R98..
00148830: 0200 0700 0000 0430 3130 3000 0000 0000  .......0100.....
00148840: 0009 0001 0002 0000 0002 4e00 0000 0002  ..........N.....
00148850: 0005 0000 0003 0000 0364 0003 0002 0000  .........d......
00148860: 0002 5700 0000 0004 0005 0000 0003 0000  ..W.............
00148870: 037c 0005 0001 0000 0001 0000 0000 0006  .|..............
00148880: 0005 0000 0001 0000 0394 0007 0005 0000  ................
00148890: 0003 0000 039c 001b 0007 0000 000c 0000  ................
001488a0: 03b4 001d 0002 0000 000b 0000 03c0 0000  ................
001488b0: 0000 0000 002d 0000 0001 0000 0011 0000  .....-..........
001488c0: 0001 0005 d5d0 0000 2710 0000 004b 0000  ........'....K..
001488d0: 0001 0000 0033 0000 0001 0005 08f8 0000  .....3..........
001488e0: 2710 0001 85f2 0000 03e8 0000 0011 0000  '...............
001488f0: 0001 0000 0015 0000 0001 0000 0019 0000  ................
00148900: 0001 4153 4349 4900 0000 4750 5300 3230  ..ASCII...GPS.20
00148910: 3230 3a31 313a 3036 0000 0007 0103 0003  20:11:06........
00148920: 0000 0001 0006 0000 0112 0003 0000 0001  ................
00148930: 0001 0000 011a 0005 0000 0001 0000 0426  ...............&
00148940: 011b 0005 0000 0001 0000 042e 0128 0003  .............(..
00148950: 0000 0001 0002 0000 0201 0004 0000 0001  ................
00148960: 0000 0436 0202 0004 0000 0001 0000 1578  ...6...........x
00148970: 0000 0000 0000 0048 0000 0001 0000 0048  .......H.......H
00148980: 0000 0001 ffd8 ffdb 0084 0006 0405 0605  ................
00148990: 0406 0605 0607 0706 080a 100a 0a09 090a  ................
001489a0: 140e 0f0c 1017 1418 1817 1416 161a 1d25  ...............%
001489b0: 1f1a 1b23 1c16 1620 2c20 2326 2729 2a29  ...#... , #&')*)
001489c0: 191f 2d30 2d28 3025 2829 2801 0707 070a  ..-0-(0%()(.....
001489d0: 080a 130a 0a13 281a 161a 2828 2828 2828  ......(...((((((
001489e0: 2828 2828 2828 2828 2828 2828 2828 2828  ((((((((((((((((
001489f0: 2828 2828 2828 2828 2828 2828 2828 2828  ((((((((((((((((
00148a00: 2828 2828 2828 2828 2828 2828 ffc0 0011  ((((((((((((....
00148a10: 0800 7800 a003 0122 0002 1101 0311 01ff  ..x...."........
00148a20: c401 a200 0001 0501 0101 0101 0100 0000  ................
00148a30: 0000 0000 0001 0203 0405 0607 0809 0a0b  ................
00148a40: 1000 0201 0303 0204 0305 0504 0400 0001  ................
00148a50: 7d01 0203 0004 1105 1221 3141 0613 5161  }........!1A..Qa
00148a60: 0722 7114 3281 91a1 0823 42b1 c115 52d1  ."q.2....#B...R.
00148a70: f024 3362 7282 090a 1617 1819 1a25 2627  .$3br........%&'
00148a80: 2829 2a34 3536 3738 393a 4344 4546 4748  ()*456789:CDEFGH
00148a90: 494a 5354 5556 5758 595a 6364 6566 6768  IJSTUVWXYZcdefgh
00148aa0: 696a 7374 7576 7778 797a 8384 8586 8788  ijstuvwxyz......
00148ab0: 898a 9293 9495 9697 9899 9aa2 a3a4 a5a6  ................
00148ac0: a7a8 a9aa b2b3 b4b5 b6b7 b8b9 bac2 c3c4  ................
00148ad0: c5c6 c7c8 c9ca d2d3 d4d5 d6d7 d8d9 dae1  ................
00148ae0: e2e3 e4e5 e6e7 e8e9 eaf1 f2f3 f4f5 f6f7  ................
00148af0: f8f9 fa01 0003 0101 0101 0101 0101 0100  ................
00148b00: 0000 0000 0001 0203 0405 0607 0809 0a0b  ................
00148b10: 1100 0201 0204 0403 0407 0504 0400 0102  ................
00148b20: 7700 0102 0311 0405 2131 0612 4151 0761  w.......!1..AQ.a
00148b30: 7113 2232 8108 1442 91a1 b1c1 0923 3352  q."2...B.....#3R
00148b40: f015 6272 d10a 1624 34e1 25f1 1718 191a  ..br...$4.%.....
00148b50: 2627 2829 2a35 3637 3839 3a43 4445 4647  &'()*56789:CDEFG
00148b60: 4849 4a53 5455 5657 5859 5a63 6465 6667  HIJSTUVWXYZcdefg
00148b70: 6869 6a73 7475 7677 7879 7a82 8384 8586  hijstuvwxyz.....
00148b80: 8788 898a 9293 9495 9697 9899 9aa2 a3a4  ................
00148b90: a5a6 a7a8 a9aa b2b3 b4b5 b6b7 b8b9 bac2  ................
00148ba0: c3c4 c5c6 c7c8 c9ca d2d3 d4d5 d6d7 d8d9  ................
00148bb0: dae2 e3e4 e5e6 e7e8 e9ea f2f3 f4f5 f6f7  ................
00148bc0: f8f9 faff da00 0c03 0100 0211 0311 003f  ...............?
00148bd0: 00fa 445b b50d 1ec5 2ce7 6a8e 493d 00ae  ..D[....,.j.I=..

.... Thumbnail ....

00149ee0: 7f85 c104 fe5f e15c 747f 7ff0 ad7d 0ffe  ....._.\t....}..
00149ef0: 420b 53ec e2fa 1cd3 a713 ffd9            B.S.........

mvaranda avatar Nov 08 '20 13:11 mvaranda

Actually, _find_webp_exif points correctly to the "EXIT" FourCC and Size: 0x47e060. It seem to be missing a seek(6,1) to point to the correct endian byte. However, the caller is still not happy with the returned offset or format.

mvaranda avatar Nov 08 '20 14:11 mvaranda

fix for my use-case is following. Need regression test as I do not have the history. Maybe the original code was good for a different EXIF version.

def _find_webp_exif(fh: BinaryIO) -> tuple:
    logger.debug("WebP format recognized in data[0:4], data[8:12]")
    # file specification: https://developers.google.com/speed/webp/docs/riff_container
    data = fh.read(5)
    if data[0:4] == b'VP8X' and data[4] & 8:
        # https://developers.google.com/speed/webp/docs/riff_container#extended_file_format
        fh.seek(13, 1)
        while True:
            data = fh.read(8)  # Chunk FourCC (32 bits) and Chunk Size (32 bits)
            if len(data) != 8:
                raise InvalidExif("Invalid webp file chunk header.")
            if data[0:4] == b'EXIF':
                fh.seek(6, 1)
                offset = fh.tell()
                endian = fh.read(1)
                return offset, endian
            size = struct.unpack('<L', data[4:8])[0]
            fh.seek(size, 1)
    raise ExifNotFound("Webp file does not have exif data.")

mvaranda avatar Nov 08 '20 15:11 mvaranda