clickhouse-java
clickhouse-java copied to clipboard
When serialize FixedString and string is bigger than size defined in Table definition
Describe your feedback
Currently, the behavior is that if the string that is sent to serialize is bigger, we send only the length defined in the table schema.
A good approach is to have a flag that indicates if it's a bigger throw Exception or send only a string of a defined size.
Code example
How it's done today writeFixedString only what is needed.
BinaryStreamUtils.writeFixedString(stream, convertToString(value), column.getPrecision());
@mzitnik While working on this issue I've got next thoughts:
- the flag will affect whole client. So it would be problematic to tolerate bigger strings from subset of columns
- what if just have a method that truncates string to defined length?
- if we truncate string at utf8 symbol position it will result string with completely different meaning.
Here is the test for the last point:
@Test
public void testUtfStringTruncate() {
final String str = "a☺c";
byte[] bytes = str.getBytes(StandardCharsets.UTF_8);
System.out.println("bytes.length: " + bytes.length);
for (int i = 0; i < bytes.length; i++) {
System.out.println(String.format("bytes[%d]: %x", i, bytes[i]));
}
byte[] twoBytes= new byte[] {bytes[0], bytes[1]};
String twoByteString = new String(twoBytes, StandardCharsets.UTF_8);
System.out.println("twoByteString: " + twoByteString);
System.out.println("twoByteString.length: " + twoByteString.length());
}
Output:
bytes.length: 5
bytes[0]: 61
bytes[1]: e2
bytes[2]: 98
bytes[3]: ba
bytes[4]: 63
twoByteString: a�
twoByteString.length: 2
The correct string should be a string with only a because e2 is part of ☺. In this case 3 bytes used for a character.