DocBleach
DocBleach copied to clipboard
Properly handle non-UTF8 filenames in Zip
at java.base/java.lang.StringCoding.throwMalformed(StringCoding.java:685)
at java.base/java.lang.StringCoding.decodeUTF8_0(StringCoding.java:768)
at java.base/java.lang.StringCoding.newStringUTF8NoRepl(StringCoding.java:965)
at java.base/java.lang.System$2.newStringUTF8NoRepl(System.java:2197)
at java.base/java.util.zip.ZipCoder$UTF8.toString(ZipCoder.java:60)
at java.base/java.util.zip.ZipCoder.toString(ZipCoder.java:87)
at java.base/java.util.zip.ZipInputStream.readLOC(ZipInputStream.java:301)
at java.base/java.util.zip.ZipInputStream.getNextEntry(ZipInputStream.java:123)
at xyz.docbleach.module.zip.ArchiveBleach.sanitize(ArchiveBleach.java:44)
at xyz.docbleach.api.bleach.CompositeBleach.sanitize(CompositeBleach.java:74)
at xyz.docbleach.api.BleachSession.sanitize(BleachSession.java:71)
at xyz.docbleach.cli.Main.sanitize(Main.java:81)
at xyz.docbleach.cli.Main.main(Main.java:54)
Caused by: java.nio.charset.MalformedInputException: Input length = 1
... 13 more
Process finished with exit code 1
Sample file: e35d68feda25f401da03883da1e9c437
Archive: VirusShare_e35d68feda25f401da03883da1e9c437
Zip file size: 1978422 bytes, number of entries: 4
drwx--- 3.1 fat 0 bx stor 13-Jun-24 09:44 bulletstorm-trainer18/
-rwxa-- 3.1 fat 2030080 bx defN 11-Mar-12 21:16 bulletstorm-trainer18/BS+28Tr-LinGon.exe
-rw-a-- 3.1 fat 893 tx defN 13-Jun-24 09:48 bulletstorm-trainer18/+�+���+��+��.txt
-rw-a-- 3.1 fat 151 tx defN 13-Mar-29 17:14 bulletstorm-trainer18/+�+���+��+��.url
4 files, 2031124 bytes uncompressed, 1977223 bytes compressed: 2.7%
Full zipinfo:
Archive: VirusShare_e35d68feda25f401da03883da1e9c437
The zipfile comment is 241 bytes long and contains the following text:
======================== zipfile comment begins ==========================
������� http://www.cr173.com
�������������������û���⡣
�ٶ�һ�¡��������������ϲ������Ŷ��
--------------------------------------------------
--------------------------------------------------
��ѹ���룺www.cr173.com
========================= zipfile comment ends ===========================
End-of-central-directory record:
-------------------------------
Zip archive file size: 1978422 (00000000001E3036h)
Actual end-cent-dir record offset: 1978159 (00000000001E2F2Fh)
Expected end-cent-dir record offset: 1978159 (00000000001E2F2Fh)
(based on the length of the central directory and its expected offset)
This zipfile constitutes the sole disk of a single-part archive; its
central directory contains 4 entries.
The central directory is 572 (000000000000023Ch) bytes long,
and its (expected) offset in bytes from the beginning of the zipfile
is 1977587 (00000000001E2CF3h).
Central directory entry #1:
---------------------------
bulletstorm-trainer18/
offset of local header from start of archive: 0
(0000000000000000h) bytes
file system or operating system of origin: MS-DOS, OS/2 or NT FAT
version of encoding software: 3.1
minimum file system compatibility required: MS-DOS, OS/2 or NT FAT
minimum software version required to extract: 1.0
compression method: none (stored)
file security status: not encrypted
extended local header: no
file last modified on (DOS date/time): 2013 Jun 24 09:44:28
32-bit CRC value (hex): 00000000
compressed size: 0 bytes
uncompressed size: 0 bytes
length of filename: 22 characters
length of extra field: 36 bytes
length of file comment: 0 characters
disk number on which file begins: disk 1
apparent file type: binary
non-MSDOS external file attributes: 000000 hex
MS-DOS file attributes (10 hex): dir
The central-directory extra field contains:
- A subfield with ID 0x000a (PKWARE Win32) and 32 data bytes. The first
20 are: 00 00 00 00 01 00 18 00 d4 e6 d4 5d 7c 70 ce 01 d4 e6 d4 5d.
There is no file comment.
Central directory entry #2:
---------------------------
There are an extra -36 bytes preceding this file.
bulletstorm-trainer18/BS+28Tr-LinGon.exe
offset of local header from start of archive: 52
(0000000000000034h) bytes
file system or operating system of origin: MS-DOS, OS/2 or NT FAT
version of encoding software: 3.1
minimum file system compatibility required: MS-DOS, OS/2 or NT FAT
minimum software version required to extract: 2.0
compression method: deflated
compression sub-type (deflation): normal
file security status: not encrypted
extended local header: no
file last modified on (DOS date/time): 2011 Mar 12 21:16:50
32-bit CRC value (hex): 3083c7e1
compressed size: 1976519 bytes
uncompressed size: 2030080 bytes
length of filename: 40 characters
length of extra field: 36 bytes
length of file comment: 0 characters
disk number on which file begins: disk 1
apparent file type: binary
non-MSDOS external file attributes: 000000 hex
MS-DOS file attributes (20 hex): arc
The central-directory extra field contains:
- A subfield with ID 0x000a (PKWARE Win32) and 32 data bytes. The first
20 are: 00 00 00 00 01 00 18 00 20 37 fb bf b7 e0 cb 01 66 31 2f 59.
There is no file comment.
Central directory entry #3:
---------------------------
There are an extra -36 bytes preceding this file.
bulletstorm-trainer18/+�+���+��+��.txt
offset of local header from start of archive: 1976641
(00000000001E2941h) bytes
file system or operating system of origin: MS-DOS, OS/2 or NT FAT
version of encoding software: 3.1
minimum file system compatibility required: MS-DOS, OS/2 or NT FAT
minimum software version required to extract: 2.0
compression method: deflated
compression sub-type (deflation): normal
file security status: not encrypted
extended local header: no
file last modified on (DOS date/time): 2013 Jun 24 09:48:12
32-bit CRC value (hex): 558ee91a
compressed size: 570 bytes
uncompressed size: 893 bytes
length of filename: 38 characters
length of extra field: 89 bytes
length of file comment: 0 characters
disk number on which file begins: disk 1
apparent file type: text
non-MSDOS external file attributes: 000000 hex
MS-DOS file attributes (20 hex): arc
The central-directory extra field contains:
- A subfield with ID 0x000a (PKWARE Win32) and 32 data bytes. The first
20 are: 00 00 00 00 01 00 18 00 aa 21 43 e3 7c 70 ce 01 1c 87 ee 5b.
- A subfield with ID 0x7075 (UTF8 path name) and 49 data bytes. The first
24 UTF8 bytes in the extra field (V1, ASCII name CRC `f37f78d7') are:
62 75 6c 6c 65 74 73 74 6f 72 6d 2d 74 72 61 69 6e 65 72 31 38 2f e8 a5.
There is no file comment.
Central directory entry #4:
---------------------------
There are an extra -36 bytes preceding this file.
bulletstorm-trainer18/+�+���+��+��.url
offset of local header from start of archive: 1977332
(00000000001E2BF4h) bytes
file system or operating system of origin: MS-DOS, OS/2 or NT FAT
version of encoding software: 3.1
minimum file system compatibility required: MS-DOS, OS/2 or NT FAT
minimum software version required to extract: 2.0
compression method: deflated
compression sub-type (deflation): normal
file security status: not encrypted
extended local header: no
file last modified on (DOS date/time): 2013 Mar 29 17:14:02
32-bit CRC value (hex): fcf30365
compressed size: 134 bytes
uncompressed size: 151 bytes
length of filename: 38 characters
length of extra field: 89 bytes
length of file comment: 0 characters
disk number on which file begins: disk 1
apparent file type: text
non-MSDOS external file attributes: 000000 hex
MS-DOS file attributes (20 hex): arc
The central-directory extra field contains:
- A subfield with ID 0x000a (PKWARE Win32) and 32 data bytes. The first
20 are: 00 00 00 00 01 00 18 00 b8 1b 98 c1 5d 2c ce 01 1c 87 ee 5b.
- A subfield with ID 0x7075 (UTF8 path name) and 49 data bytes. The first
24 UTF8 bytes in the extra field (V1, ASCII name CRC `1b3e623c') are:
62 75 6c 6c 65 74 73 74 6f 72 6d 2d 74 72 61 69 6e 65 72 31 38 2f e8 a5.
There is no file comment.
4 files, 2031124 bytes uncompressed, 1977223 bytes compressed: 2.7%