clickhouse-java icon indicating copy to clipboard operation
clickhouse-java copied to clipboard

asBinary() with TSV format returns incorrect data

Open misha-nik opened this issue 2 years ago • 4 comments

Describe the bug

I tried to select hash in both tsv and row binary formats and got unexpected behaviour:

  1. asString() returns same value for both query as expected
  2. asBinary() returns unexpectedly returns different byte arrays
  3. query in RowBinaryWithNamesAndTypes return same value as local hash evaluation

It looks like a bug in parsing non-UTF strings value in TSV

Expected behaviour

All asserts in code below pass without errors

Code example

import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.Arrays;

import com.clickhouse.client.ClickHouseClient;
import com.clickhouse.client.ClickHouseException;
import com.clickhouse.client.ClickHouseFormat;
import com.clickhouse.client.ClickHouseNodes;
import com.clickhouse.client.ClickHouseProtocol;
import com.clickhouse.client.ClickHouseResponse;
import com.clickhouse.client.ClickHouseValue;

class CHClientBinaryBug {
    public static void main(String[] args) throws ClickHouseException, NoSuchAlgorithmException {

        String message = "abc";
        MessageDigest md = MessageDigest.getInstance("SHA-512");
        byte[] targetHash = md.digest(message.getBytes());

        ClickHouseNodes server = ClickHouseNodes.of("http://localhost:8123");
        try (ClickHouseClient client = ClickHouseClient.newInstance(ClickHouseProtocol.HTTP)) {
            byte[] fromRowBinary;
            String fromRowBinaryAsString;
            try (ClickHouseResponse response = client.connect(server)
                    .format(ClickHouseFormat.RowBinaryWithNamesAndTypes)
                    .query("SELECT SHA512('" + message +"')")
                    .executeAndWait()) {
                ClickHouseValue value = response.firstRecord().getValue(0);

                fromRowBinary = value.asBinary();
                fromRowBinaryAsString = value.asString();
                assert Arrays.equals(fromRowBinary, targetHash);
            }
            byte[] fromTSV;
            String fromTSVAsString;
            try (ClickHouseResponse response = client.connect(server)
                    .format(ClickHouseFormat.TabSeparatedWithNamesAndTypes)
                    .query("SELECT SHA512('" + message +"')")
                    .executeAndWait()) {
                ClickHouseValue value = response.firstRecord().getValue(0);

                fromTSV = value.asBinary();
                fromTSVAsString = value.asString();
            }

            // OK
            assert fromTSVAsString.equals(fromRowBinaryAsString);
            // OK
            assert Arrays.equals(targetHash, fromRowBinary);
            // Error
            assert Arrays.equals(targetHash, fromTSV);
        }
    }
}

Configuration

Environment

  • Client version: clickhouse-client-0.3.2-patch10
  • Language version: Java 17
  • OS: Mac OS Monterey

ClickHouse server

  • ClickHouse Server version: 23.3.1.2823
  • Empty DB runing in docker

misha-nik avatar Jul 11 '23 12:07 misha-nik

Hi @mixNIK999, apologize for the inconvenience. ClickHouse uses String data type for both text and binary data. In Java lib, we use java.lang.String along with method asBinary(charset) to convert text back to byte array. Starting from v0.4, String is treated as text in Java by default(based on majority use cases I knew of), which improves deserialization by ~20%. However, you can still enable binary string support by setting use_binary_string to true, which asks the lib to read the original bytes from ClickHouse and it's up to you how to deal with that.

zhicwu avatar Jul 11 '23 23:07 zhicwu

Cool, use_binary_string looks like what I need. However do I understand correctly that after v0.4 query using RowBinary format and default use_binary_string: false will have different behaviour for asBinary() (same as TSV v0.3)? For example test above will fail at the second assert

misha-nik avatar Jul 12 '23 10:07 misha-nik

This issue has been automatically marked as stale because it has not had activity in the last year. It will be closed in 30 days if no further activity occurs. Please feel free to leave a comment if you believe the issue is still relevant. Thank you for your contributions!

github-actions[bot] avatar Jan 08 '25 00:01 github-actions[bot]

Relates to https://github.com/ClickHouse/clickhouse-java/issues/2263

chernser avatar Nov 14 '25 22:11 chernser