clickhouse-java icon indicating copy to clipboard operation
clickhouse-java copied to clipboard

When serialize FixedString and string is bigger than size defined in Table definition

Open mzitnik opened this issue 9 months ago • 1 comments

Describe your feedback

Currently, the behavior is that if the string that is sent to serialize is bigger, we send only the length defined in the table schema.

A good approach is to have a flag that indicates if it's a bigger throw Exception or send only a string of a defined size.

Code example

How it's done today writeFixedString only what is needed.

     BinaryStreamUtils.writeFixedString(stream, convertToString(value), column.getPrecision());

mzitnik avatar Mar 14 '25 15:03 mzitnik

@mzitnik While working on this issue I've got next thoughts:

  • the flag will affect whole client. So it would be problematic to tolerate bigger strings from subset of columns
  • what if just have a method that truncates string to defined length?
  • if we truncate string at utf8 symbol position it will result string with completely different meaning.

Here is the test for the last point:

    @Test
    public void testUtfStringTruncate() {
        final String str = "a☺c";
        byte[] bytes = str.getBytes(StandardCharsets.UTF_8);
        System.out.println("bytes.length: " + bytes.length);
        for (int i = 0; i < bytes.length; i++) {
            System.out.println(String.format("bytes[%d]: %x", i, bytes[i]));
        }

        byte[] twoBytes= new byte[] {bytes[0], bytes[1]};
        String twoByteString = new String(twoBytes, StandardCharsets.UTF_8);
        System.out.println("twoByteString: " + twoByteString);
        System.out.println("twoByteString.length: " + twoByteString.length());
    }

Output:

bytes.length: 5
bytes[0]: 61
bytes[1]: e2
bytes[2]: 98
bytes[3]: ba
bytes[4]: 63
twoByteString: a�
twoByteString.length: 2

The correct string should be a string with only a because e2 is part of . In this case 3 bytes used for a character.

chernser avatar Aug 12 '25 06:08 chernser