click_house
click_house copied to clipboard
`.select_all` and `.select_one` methods return inconsistent data when # of rows is above a certain threshold
Hey @shlima!
During local testing, I'm noticing inconsistent data returned by the .select_all
and .select_one
methods. Specifically, when there is X amount of data inserted into the table (in my local dev I only have ~2.5k rows).
Here's an example with .select_one
:

As you can see, the first example returns an opening bracket cast as a string. The second example returns what I'd usually expect.
Here's an example with .select_all
:

Here, the first example is returning raw JSON, this is not expected.
I have an inclination that Faraday is struggling to deserialize the JSON response (because of the response size) which causes the above issues. I don't have a proposed fix but I'm wondering if you've encountered this before and have suggestions on how to move forward?
Happened to me too, got the same! As a short term fix we're just moving everything to use the execute
method
Sorry, but I cant reproduce the issue
Also have it
It happened after we updated clickhouse from 21.3 to 21.8 version. I noticed that this might be related to the way CH sends the response in HTTP interface - in my tests sometimes it sent it as single-line string, sometimes as a pretty formatted json with newlines. And my theory is that in this case the Faraday doesn't parse the response.
The issue is very hard to catch. It happens sometimes very rare, but then it can be more often, then rare again. And it almost doesn't depend on anything. Very strange!
I'm trying this monkey patch
@mib32 You should capture faraday.headers
also, especially content-type
header, this will help a lot.
@shlima Maybe I'll try later one more time, but I did, and I saw that content-type was application/json :( I don't remember the other ones.
Monkey patch works good
We hit this issue as well, I believe this is connected with https://github.com/ClickHouse/ClickHouse/issues/33308
For what it's worth, we also hit this and digging into this issue with @anselmodadams was what actually led to ClickHouse/ClickHouse#33308. We think the root cause is not in this gem, so I didn't raise it here. In the meantime we have a monkey-patch that removes the send_progress_in_http_headers
parameter from ClickHouse::Connection
, like this:
module ClickHouse::HttpProgressWorkaround
def get(path = "/", body: "", query: {}, database: config.database)
if query.is_a?(String)
query = { query: query }
config.logger!.warn('since v1.4.0 use connection.get(body: "SELECT 1") instead of connection.get(query: "SELECT 1")')
end
transport.get(path) do |conn|
conn.params = query.merge(database: database).compact
conn.body = body
end
end
def compose(path, query = {})
"#{path}?#{URI.encode_www_form(query.compact)}"
end
end
ClickHouse::Connection.prepend ClickHouse::HttpProgressWorkaround
That said, this workaround may not be suitable for others, because it causes the logging reports about the number of rows read to all say zero (because the HTTP progress reports are where it extracts this information from). It's also brittle, as it copy-pastes the gem code, as I couldn't see an easy way to "add" to these methods to remove the send_progress_in_http_headers
header (compare to the methods in ClickHouse::Connection
).