clickhouse-scala-client icon indicating copy to clipboard operation
clickhouse-scala-client copied to clipboard

Clickhouse Internal Error when data contains "DB::Exception"

Open zella opened this issue 1 year ago • 1 comments

Our data can contains "DB::Exception". For example - search history in google.

When query this data we have: com.crobox.clickhouse.ClickhouseException: Clickhouse Internal Error

Maybe is there a better way to handle errors in clickhouse?

zella avatar May 24 '23 10:05 zella

Thanks for bringing up this issue, you are right see https://github.com/crobox/clickhouse-scala-client/blob/f1fefeb4f135b4822e612878a0b7caba2ac0e305/client/src/main/scala/com.crobox.clickhouse/internal/ClickhouseResponseParser.scala#L30

The problem though is what would be a better way to detect errors, this is a issue when the server already started streaming so the HTTP headers are already sent. One thing I can think of is that if send_progress_in_http_headers is set to 0 we ignore the error in the body since then the HTTP code should return the error?

curl -v "http://localhost:8123/query?send_progress_in_http_headers=1&max_rows_to_read=10000000&query=SELECT%20uniqExact(*)%20FROM%20system.numbers"
*   Trying 127.0.0.1:8123...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8123 (#0)
> GET /query?send_progress_in_http_headers=1&max_rows_to_read=10000000&query=SELECT%20uniqExact(*)%20FROM%20system.numbers HTTP/1.1
> Host: localhost:8123
> User-Agent: curl/7.68.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Date: Wed, 24 May 2023 12:55:10 GMT
< Connection: Keep-Alive
< Content-Type: text/tab-separated-values; charset=UTF-8
< X-ClickHouse-Server-Display-Name: fcb60aa4691f
< Transfer-Encoding: chunked
< X-ClickHouse-Query-Id: aa986737-3f90-47e9-91c3-46d49d1bc956
< X-ClickHouse-Format: TabSeparated
< X-ClickHouse-Timezone: UTC
< Keep-Alive: timeout=3
< X-ClickHouse-Progress: {"read_rows":"2227170","read_bytes":"17817360","written_rows":"0","written_bytes":"0","total_rows_to_read":"0"}
< X-ClickHouse-Progress: {"read_rows":"5764440","read_bytes":"46115520","written_rows":"0","written_bytes":"0","total_rows_to_read":"0"}
< X-ClickHouse-Progress: {"read_rows":"8515650","read_bytes":"68125200","written_rows":"0","written_bytes":"0","total_rows_to_read":"0"}
< X-ClickHouse-Summary: {"read_rows":"10022265","read_bytes":"80178120","written_rows":"0","written_bytes":"0","total_rows_to_read":"0"}
< 
Code: 158. DB::Exception: Limit for rows or bytes to read exceeded, max rows: 10.00 million, current rows: 10.02 million: While executing Numbers. (TOO_MANY_ROWS) (version 22.3.15.34.altinitystable (altinity build))
* Connection #0 to host localhost left intact

versus:

curl -v "http://localhost:8123/query?max_rows_to_read=10000000&query=SELECT%20uniqExact(*)%20FROM%20system.numbers"
*   Trying 127.0.0.1:8123...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8123 (#0)
> GET /query?max_rows_to_read=10000000&query=SELECT%20uniqExact(*)%20FROM%20system.numbers HTTP/1.1
> Host: localhost:8123
> User-Agent: curl/7.68.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 500 Internal Server Error
< Date: Wed, 24 May 2023 13:05:04 GMT
< Connection: Keep-Alive
< Content-Type: text/tab-separated-values; charset=UTF-8
< X-ClickHouse-Server-Display-Name: fcb60aa4691f
< Transfer-Encoding: chunked
< X-ClickHouse-Query-Id: be29836c-f930-4207-b23d-370dc17e8051
< X-ClickHouse-Format: TabSeparated
< X-ClickHouse-Timezone: UTC
< X-ClickHouse-Exception-Code: 158
< Keep-Alive: timeout=3
< X-ClickHouse-Summary: {"read_rows":"0","read_bytes":"0","written_rows":"0","written_bytes":"0","total_rows_to_read":"0"}
< 
Code: 158. DB::Exception: Limit for rows or bytes to read exceeded, max rows: 10.00 million, current rows: 10.02 million: While executing Numbers. (TOO_MANY_ROWS) (version 22.3.15.34.altinitystable (altinity build))
* Connection #0 to host localhost left intact

sjoerdmulder avatar May 24 '23 13:05 sjoerdmulder