asBinary() with TSV format returns incorrect data
Describe the bug
I tried to select hash in both tsv and row binary formats and got unexpected behaviour:
- asString() returns same value for both query as expected
- asBinary() returns unexpectedly returns different byte arrays
- query in RowBinaryWithNamesAndTypes return same value as local hash evaluation
It looks like a bug in parsing non-UTF strings value in TSV
Expected behaviour
All asserts in code below pass without errors
Code example
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.Arrays;
import com.clickhouse.client.ClickHouseClient;
import com.clickhouse.client.ClickHouseException;
import com.clickhouse.client.ClickHouseFormat;
import com.clickhouse.client.ClickHouseNodes;
import com.clickhouse.client.ClickHouseProtocol;
import com.clickhouse.client.ClickHouseResponse;
import com.clickhouse.client.ClickHouseValue;
class CHClientBinaryBug {
public static void main(String[] args) throws ClickHouseException, NoSuchAlgorithmException {
String message = "abc";
MessageDigest md = MessageDigest.getInstance("SHA-512");
byte[] targetHash = md.digest(message.getBytes());
ClickHouseNodes server = ClickHouseNodes.of("http://localhost:8123");
try (ClickHouseClient client = ClickHouseClient.newInstance(ClickHouseProtocol.HTTP)) {
byte[] fromRowBinary;
String fromRowBinaryAsString;
try (ClickHouseResponse response = client.connect(server)
.format(ClickHouseFormat.RowBinaryWithNamesAndTypes)
.query("SELECT SHA512('" + message +"')")
.executeAndWait()) {
ClickHouseValue value = response.firstRecord().getValue(0);
fromRowBinary = value.asBinary();
fromRowBinaryAsString = value.asString();
assert Arrays.equals(fromRowBinary, targetHash);
}
byte[] fromTSV;
String fromTSVAsString;
try (ClickHouseResponse response = client.connect(server)
.format(ClickHouseFormat.TabSeparatedWithNamesAndTypes)
.query("SELECT SHA512('" + message +"')")
.executeAndWait()) {
ClickHouseValue value = response.firstRecord().getValue(0);
fromTSV = value.asBinary();
fromTSVAsString = value.asString();
}
// OK
assert fromTSVAsString.equals(fromRowBinaryAsString);
// OK
assert Arrays.equals(targetHash, fromRowBinary);
// Error
assert Arrays.equals(targetHash, fromTSV);
}
}
}
Configuration
Environment
- Client version: clickhouse-client-0.3.2-patch10
- Language version: Java 17
- OS: Mac OS Monterey
ClickHouse server
- ClickHouse Server version: 23.3.1.2823
- Empty DB runing in docker
Hi @mixNIK999, apologize for the inconvenience. ClickHouse uses String data type for both text and binary data. In Java lib, we use java.lang.String along with method asBinary(charset) to convert text back to byte array. Starting from v0.4, String is treated as text in Java by default(based on majority use cases I knew of), which improves deserialization by ~20%. However, you can still enable binary string support by setting use_binary_string to true, which asks the lib to read the original bytes from ClickHouse and it's up to you how to deal with that.
Cool, use_binary_string looks like what I need.
However do I understand correctly that after v0.4 query using RowBinary format and default use_binary_string: false will have different behaviour for asBinary() (same as TSV v0.3)? For example test above will fail at the second assert
This issue has been automatically marked as stale because it has not had activity in the last year. It will be closed in 30 days if no further activity occurs. Please feel free to leave a comment if you believe the issue is still relevant. Thank you for your contributions!
Relates to https://github.com/ClickHouse/clickhouse-java/issues/2263