base85
base85 copied to clipboard
ASCII85. Four zero bytes encoded as !!!!! instead of z
Four zero bytes encoded as !!!!!
instead of z
.
$ascii85 = new Base85([
"characters" => Base85::ASCII85,
"compress.spaces" => false,
"compress.zeroes" => true
]);
print $ascii85->encode("\0\0\0\0"); // !!!!!
I tested with https://cryptii.com/pipes/ascii85-encoding
I tested with Python too. base64.a85encode
outputs z
for four zero bytes.
It is intentional, the z compression does not apply to the final block. This is because the input string is padded with 0x00 to be multiple of 4 and we need to be able to distinguish if the final four zero bytes are padding or actual data.
For example if we have data:
0xaabbccddee
The padded four byte blocks it would be:
0xaabbccdd
0xee000000
$ascii85->encode(hex2bin("aabbccddee"));
/* Wk6L2mJ */
bin2hex($ascii85->decode("Wk6L2mJ"));
/* aabbccddee */
If however the data was:
0xaabbccdd00
The padded four byte blocks it would be:
0xaabbccdd
0x00000000
With current behaviour the z compression is not added to the last block:
$ascii85->encode(hex2bin("aabbccdd00"));
/* Wk6L2!! */
print bin2hex($ascii85->decode("Wk6L2!!"));
/* aabbccdd00 */
However if the z compression was also applied to the last block the decoder could not anymore know which zero bytes are padding and which are data. You can test this by commenting out these lines.
$ascii85->encode(hex2bin("aabbccdd00"));
/* Wk6L2z */
print bin2hex($ascii85->decode("Wk6L2z"));
/* aabbccdd00000000 */
You can also see the Cryptii page has the wrong result with aabbccdd00
input.
Where you found this "the z compression is not added to the last block". I want to read.
It is described at least in Adobe documents Document management — Portable document format — Part 1: PDF 1.7 and PostScript® LANGUAGE REFERENCE third edition. The interesting parts are:
"If the length of the data to be encoded is not a multiple of 4 bytes, the last, partial group of 4 shall be used to produce a last, partial group of 5 output characters. Given n (1, 2, or 3) bytes of binary data, the encoder shall first append 4 - n zero bytes to make a complete group of 4. It shall encode this group in the usual way, but shall not apply the special z case. Finally, it shall write only the first n + 1 characters of the resulting group of 5. These characters shall be immediately followed by the ~> EOD marker."
and
"If the ASCII85Encode filter is closed when the number of characters written to it is not a multiple of 4, it uses the characters of the last, partial 4-tuple to produce a last, partial 5-tuple of output. Given n (1, 2, or 3) bytes of binary data, it first appends 4 − n zero bytes to make a complete 4-tuple. Then, it encodes the 4-tuple in the usual way, but without applying the z special case. Finally, it writes the first n + 1 bytes of the resulting 5-tuple. Those bytes are followed immediately by the ~> EOD marker. This information is sufficient to correctly encode the number of final bytes and the values of those bytes. "