BlazingChain icon indicating copy to clipboard operation
BlazingChain copied to clipboard

Can't Serialize

Open manticore-projects opened this issue 3 years ago • 3 comments

Greetings! First of all, big thank you for providing this library. I do use it very successfully already at JSQLFormatter for the URL parameters: http://jsqlformatter.manticore-projects.com/jsqlformatter/demo.html?args=-c%20MoUQMiDCAqAEsEYBQ9WoDSwExIGICUB5AWVgBMBXAQwBskB1ACRHxFitgF5YAjJAbiA

Now I also wanted to use it from serializing Java Objects into XML and I was expecting the following code to work:

public static String encodeObject(Object object) throws IOException {
    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
    ObjectOutput objectOutput= new ObjectOutputStream(byteArrayOutputStream);
    objectOutput.writeObject(object);
    objectOutput.flush();
    objectOutput.close();
    byteArrayOutputStream.flush();
    
    String s = new String(byteArrayOutputStream.toByteArray());
    return LZSEncoding.compressToBase64(s);
  }

According to my understanding, this would give a Base64 encoded String which I can write into the XML. Writing of course works well, however I get a Corrupted Stream message when De-Serializing.

Odd enough, the following code works around that problem:

public static String encodeObject(Object object) throws IOException {
    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
    ObjectOutput objectOutput= new ObjectOutputStream(byteArrayOutputStream);
    objectOutput.writeObject(object);
    objectOutput.flush();
    objectOutput.close();
    byteArrayOutputStream.flush();

    String s = Base64.getEncoder().encodeToString(byteArrayOutputStream.toByteArray());
    return LZSEncoding.compressToBase64(s);
  }

My question is: where is my understanding wrong and why is Base64 getting the encoding correct and LZSEnconding does not? What am I missing here please?

manticore-projects avatar Sep 10 '22 10:09 manticore-projects

Sorry to bother you, I figured it that Serialization works only with StandardCharsets.ISO_8859_1. However, now I am confused even more: Plain Base64 Encoder returns shorter Strings than LZSEncoder?!

@Test
  public void testSerialization() throws IOException, ClassNotFoundException {
    Object object = new BigDecimal("2345.287272");

    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
    ObjectOutput objectOutput= new ObjectOutputStream(byteArrayOutputStream);
    objectOutput.writeObject(object);
    objectOutput.flush();
    objectOutput.close();
    byteArrayOutputStream.flush();

    String serializedObjectStr = new String(byteArrayOutputStream.toByteArray(), StandardCharsets.ISO_8859_1);

    String lzsEncodedBase64 = LZSEncoding.compressToBase64( serializedObjectStr );
    String base64Encoded = Base64.getEncoder().encodeToString(byteArrayOutputStream.toByteArray());

    // Why is Base64 Encoder more efficient?!
    System.out.println(serializedObjectStr + "\n"
                       + lzsEncodedBase64 + "\n"
                       + base64Encoded);

    // verify Base64 Encoder
    byte[] bytes = serializedObjectStr.getBytes(StandardCharsets.ISO_8859_1);
    ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(bytes);
    ObjectInputStream objectInputStream = new ObjectInputStream(byteArrayInputStream);
    Assertions.assertEquals(object, objectInputStream.readObject());
    objectInputStream.close();
    byteArrayInputStream.close();

    // verify LZSEncoder
    bytes = LZSEncoding.decompressFromBase64(lzsEncodedBase64).getBytes(StandardCharsets.ISO_8859_1);
    byteArrayInputStream = new ByteArrayInputStream(bytes);
    objectInputStream = new ObjectInputStream(byteArrayInputStream);
    Assertions.assertEquals(object, objectInputStream.readObject());
    objectInputStream.close();
    byteArrayInputStream.close();

  }

manticore-projects avatar Sep 10 '22 10:09 manticore-projects

Base64Encoder is likely more efficient because you can't continue to compress a sequence of bits and continue to see smaller and smaller sizes; the serialized representation of a BigDecimal is probably not easily compressible (it may lack repetitive bit sequences). I didn't write the LZ-String algorithm, so I don't know the exact amount of overhead it requires, but some small amount of data needs to be in the serialized String so it can be deserialized correctly. That overhead might push Blazing-Chain to a larger size, though I'll run the tests you provided and see how much bigger. The Limpel-Ziv compression schemes, which LZ-String descends from, are dictionary compression algorithms, and compress commonly found "words" or "phrases" (in bits or bytes) to shorter representations than rarely-encountered ones.

tommyettinger avatar Sep 13 '22 03:09 tommyettinger

The output of that latest test, for convenience:

Checking data: 2345.287272
¬í sr java.math.BigDecimalTÇWù(O I scaleL intValt Ljava/math/BigInteger;xr java.lang.Number†¬•”à‹  xp   sr java.math.BigIntegerŒüŸ©;û IbitCountI 	bitLengthI firstNonzeroByteNumI lowestSetBitI signum[ 	magnitudet [Bxq ~ ÿÿÿÿÿÿÿÿÿÿÿþÿÿÿþ   ur [B¬óøTà  xp   ‹Ê>hxx
Has length 294
DUW4AAoAzgTmAoArAhgN2QOgLbIC4AsMAhASwHMARAUwGMScAbAFQHGBUAdQE+BAgCgDyAYDAAgAJKQoNZAyoAZMADASAO1wA1WbjAA0eSnQB6HASOky49VTJUYAbgAecAAiHMDZKrIYAcgFcsACM7ADDgAFSAXABoABSAA4Bo0TAwRwAHNOVYBHdsPEILK1wbOwAYgB+AfIB8AEr7AG+okSVJAAggklwAYQB7f3VJAEhu3HkqbwJJAGQAMxIYKFxfftUALzt+ogBPUoCsSQAYBn6AdyoVgGUqXFJcSSUoclVAgG0wEZwyVR7/AAmdzE7yIjgAjmAAH5iAD/8IRiNhAD+EcjsgBAfxwUSg4AAZ4A6AAPpQdJgJVLpLJpAAgSQAUwA+fCORxAA===
Has length 400
rO0ABXNyABRqYXZhLm1hdGguQmlnRGVjaW1hbFTHFVf5gShPAwACSQAFc2NhbGVMAAZpbnRWYWx0ABZMamF2YS9tYXRoL0JpZ0ludGVnZXI7eHIAEGphdmEubGFuZy5OdW1iZXKGrJUdC5TgiwIAAHhwAAAABnNyABRqYXZhLm1hdGguQmlnSW50ZWdlcoz8nx+pO/sdAwAGSQAIYml0Q291bnRJAAliaXRMZW5ndGhJABNmaXJzdE5vbnplcm9CeXRlTnVtSQAMbG93ZXN0U2V0Qml0SQAGc2lnbnVtWwAJbWFnbml0dWRldAACW0J4cQB+AAL///////////////7////+AAAAAXVyAAJbQqzzF/gGCFTgAgAAeHAAAAAEi8o+aHh4
Has length 392
Checking data: [2345.287272, 23452.87272, 234528.7272, 2345287.272]
¬í sr java.util.Arrays$ArrayListÙ¤<¾ÍˆÒ [ at [Ljava/lang/Object;xpur [Ljava.math.BigDecimal;Hókr‹7	<  xp   sr java.math.BigDecimalTÇWù(O I scaleL intValt Ljava/math/BigInteger;xr java.lang.Number†¬•”à‹  xp   sr java.math.BigIntegerŒüŸ©;û IbitCountI 	bitLengthI firstNonzeroByteNumI lowestSetBitI signum[ 	magnitudet [Bxq ~ ÿÿÿÿÿÿÿÿÿÿÿþÿÿÿþ   ur [B¬óøTà  xp   ‹Ê>hxxsq ~    sq ~ 	ÿÿÿÿÿÿÿÿÿÿÿþÿÿÿþ   uq ~    ‹Ê>hxxsq ~    sq ~ 	ÿÿÿÿÿÿÿÿÿÿÿþÿÿÿþ   uq ~    ‹Ê>hxxsq ~    sq ~ 	ÿÿÿÿÿÿÿÿÿÿÿþÿÿÿþ   uq ~    ‹Ê>hxx
Has length 563
DUW4AAoAzgTmBYArAhgN2QOgK4BcCWANhgIIwzICeUAJKeRQDJ5Q4CbAJQDwB9AswBEAwAEsAgMAEAA2pOQ4wAZCkMU6APQFkAOwDmagPIAjRAFMAxjgDcADwAOWOAHRlqzAFs5ACwwAhPDoARczwPAksACQBngGsYAGiAdgBITnEwOzBMgBBYMAAUVwwPHG8/QODQgBUAcYBUAHUAT4BAgAp9AGAwUQBJSCgzZAITBjBBPC0cADUh+QA0FTRkNWLPNTKeyZMdExgbOAAEQs1dDAA5LDdDXYAw4ABUgFwAaAAUgAO4tIzMwVyCpZFLy+fybHDbXYAGIAPwB8gD4AErLABvx5dQR9AAQhjwOAAwgB7LCTPpJXE4BgmXQlPoKABmeBgLDOhK0AC9doSfBRwRc3H0ADAEQkAdxMLAAyiYcH4cH0/v4tJcZEkPDotHisAATGXdKQ+awARzAAD8wABwAD/NttdqtAD/bQ7MpIHPqfMBIo4AB+CLGVd7fWyurJxABTAD5PNZrFATeaIK7oAmwEl7RmnTaXZkJFhU4LQxHo7H42bIKGy+b0xm7VnHa68wWi1GY3HU0nMh0q2na3XnY38+XC9li22gA=
Has length 612
rO0ABXNyABpqYXZhLnV0aWwuQXJyYXlzJEFycmF5TGlzdNmkPL7NiAbSAgABWwABYXQAE1tMamF2YS9sYW5nL09iamVjdDt4cHVyABdbTGphdmEubWF0aC5CaWdEZWNpbWFsO0jza3KLNwk8AgAAeHAAAAAEc3IAFGphdmEubWF0aC5CaWdEZWNpbWFsVMcVV/mBKE8DAAJJAAVzY2FsZUwABmludFZhbHQAFkxqYXZhL21hdGgvQmlnSW50ZWdlcjt4cgAQamF2YS5sYW5nLk51bWJlcoaslR0LlOCLAgAAeHAAAAAGc3IAFGphdmEubWF0aC5CaWdJbnRlZ2VyjPyfH6k7+x0DAAZJAAhiaXRDb3VudEkACWJpdExlbmd0aEkAE2ZpcnN0Tm9uemVyb0J5dGVOdW1JAAxsb3dlc3RTZXRCaXRJAAZzaWdudW1bAAltYWduaXR1ZGV0AAJbQnhxAH4AB////////////////v////4AAAABdXIAAltCrPMX+AYIVOACAAB4cAAAAASLyj5oeHhzcQB+AAUAAAAFc3EAfgAJ///////////////+/////gAAAAF1cQB+AAwAAAAEi8o+aHh4c3EAfgAFAAAABHNxAH4ACf///////////////v////4AAAABdXEAfgAMAAAABIvKPmh4eHNxAH4ABQAAAANzcQB+AAn///////////////7////+AAAAAXVxAH4ADAAAAASLyj5oeHg=
Has length 752
Checking data: Arrays.asList(new BigDecimal("2345.287272"), new BigDecimal("23452.87272"), new BigDecimal("234528.7272"), new BigDecimal("2345287.272"))
¬í t ‰Arrays.asList(new BigDecimal("2345.287272"), new BigDecimal("23452.87272"), new BigDecimal("234528.7272"), new BigDecimal("2345287.272"))
Has length 144
DUW4AAoALmCRCCAnRBDAngZwHQowGQEsMoAKAOwFMB3AAgCECBzAEQoGMCBbFAGxICIATAGYALAFYsggBwB2QfP4BKADQ1KtBi3ZdeAkRMFY5CwcrUb6TVh258hY8TKzzFq9dSvbbeh4blSbkpAA
Has length 148
rO0ABXQAiUFycmF5cy5hc0xpc3QobmV3IEJpZ0RlY2ltYWwoIjIzNDUuMjg3MjcyIiksIG5ldyBCaWdEZWNpbWFsKCIyMzQ1Mi44NzI3MiIpLCBuZXcgQmlnRGVjaW1hbCgiMjM0NTI4LjcyNzIiKSwgbmV3IEJpZ0RlY2ltYWwoIjIzNDUyODcuMjcyIikp
Has length 192

tommyettinger avatar Sep 13 '22 06:09 tommyettinger