click_house icon indicating copy to clipboard operation
click_house copied to clipboard

`.select_all` and `.select_one` methods return inconsistent data when # of rows is above a certain threshold

Open jlholm opened this issue 3 years ago • 8 comments

Hey @shlima!

During local testing, I'm noticing inconsistent data returned by the .select_all and .select_one methods. Specifically, when there is X amount of data inserted into the table (in my local dev I only have ~2.5k rows).

Here's an example with .select_one:

Screen Shot 2021-10-08 at 6 43 44 PM

As you can see, the first example returns an opening bracket cast as a string. The second example returns what I'd usually expect.

Here's an example with .select_all:

Screen Shot 2021-10-08 at 7 01 05 PM

Here, the first example is returning raw JSON, this is not expected.

I have an inclination that Faraday is struggling to deserialize the JSON response (because of the response size) which causes the above issues. I don't have a proposed fix but I'm wondering if you've encountered this before and have suggestions on how to move forward?

jlholm avatar Oct 08 '21 23:10 jlholm

Happened to me too, got the same! As a short term fix we're just moving everything to use the execute method

ferrucc-io avatar Oct 27 '21 09:10 ferrucc-io

Sorry, but I cant reproduce the issue

shlima avatar Jan 21 '22 17:01 shlima

Also have it

mib32 avatar Apr 11 '22 14:04 mib32

It happened after we updated clickhouse from 21.3 to 21.8 version. I noticed that this might be related to the way CH sends the response in HTTP interface - in my tests sometimes it sent it as single-line string, sometimes as a pretty formatted json with newlines. And my theory is that in this case the Faraday doesn't parse the response.

The issue is very hard to catch. It happens sometimes very rare, but then it can be more often, then rare again. And it almost doesn't depend on anything. Very strange!

I'm trying this monkey patch image

mib32 avatar Apr 11 '22 21:04 mib32

@mib32 You should capture faraday.headers also, especially content-type header, this will help a lot.

shlima avatar Apr 11 '22 21:04 shlima

@shlima Maybe I'll try later one more time, but I did, and I saw that content-type was application/json :( I don't remember the other ones.

Monkey patch works good

mib32 avatar Apr 12 '22 05:04 mib32

We hit this issue as well, I believe this is connected with https://github.com/ClickHouse/ClickHouse/issues/33308

madejejej avatar Apr 25 '22 12:04 madejejej

For what it's worth, we also hit this and digging into this issue with @anselmodadams was what actually led to ClickHouse/ClickHouse#33308. We think the root cause is not in this gem, so I didn't raise it here. In the meantime we have a monkey-patch that removes the send_progress_in_http_headers parameter from ClickHouse::Connection, like this:

module ClickHouse::HttpProgressWorkaround
  def get(path = "/", body: "", query: {}, database: config.database)
    if query.is_a?(String)
      query = { query: query }
      config.logger!.warn('since v1.4.0 use connection.get(body: "SELECT 1") instead of connection.get(query: "SELECT 1")')
    end

    transport.get(path) do |conn|
      conn.params = query.merge(database: database).compact
      conn.body = body
    end
  end

  def compose(path, query = {})
    "#{path}?#{URI.encode_www_form(query.compact)}"
  end
end

ClickHouse::Connection.prepend ClickHouse::HttpProgressWorkaround

That said, this workaround may not be suitable for others, because it causes the logging reports about the number of rows read to all say zero (because the HTTP progress reports are where it extracts this information from). It's also brittle, as it copy-pastes the gem code, as I couldn't see an easy way to "add" to these methods to remove the send_progress_in_http_headers header (compare to the methods in ClickHouse::Connection).

czlee avatar May 31 '22 21:05 czlee