graphite-clickhouse icon indicating copy to clipboard operation
graphite-clickhouse copied to clipboard

graphite-clickhouse doesn't close connection to clickhouse when request was canceled

Open perederyaev opened this issue 6 months ago • 0 comments

Hi! I see a lot of long-running (up to 1 minute) requests from graphite-clickhouse (v0.13.4) in our clickhouse.

For reproducing the issue I make request to graphite-clickhouse with curl:

curl 'localhost:6000/render/?format=pickle&target= _________&from=1748882404&until=1749757204' -v -m2
*   Trying 127.0.0.1:6000...
* Connected to localhost (127.0.0.1) port 6000 (#0)
> GET /render/?format=pickle&target= _________&from=1748882404&until=1749757204 HTTP/1.1
> Host: localhost:6000
> User-Agent: curl/7.76.1
> Accept: */*
>
* Operation timed out after 2004 milliseconds with 0 bytes received
* Closing connection 0
curl: (28) Operation timed out after 2004 milliseconds with 0 bytes received

curl ends in 2 second and I see "context canceled" in the graphite-clickhouse log:

[2025-06-12T23:21:50.643+0300] INFO [render] query {"request_id": "968928b6d1eefe1cc2ce18b262bfa9a8", "query": "SELECT Path FROM graphite_index WHERE ((Level=4) AND (Path LIKE 'rb.rb%' AND match(Path, '_________'))) AND (Date >='2025-06-02' AND Date <= '2025-06-12') GROUP BY Path FORMAT TabSeparatedRaw", "total_rows_to_read": "7909864", "read_rows": "7909864", "read_bytes": "416941622", "written_rows": "0", "written_bytes": "0", "query_id": "968928b6d1eefe1cc2ce18b262bfa9a8::d9e0535b5257f227", "time": 0.132063994}
[2025-06-12T23:21:50.646+0300] INFO [render] finder {"request_id": "968928b6d1eefe1cc2ce18b262bfa9a8", "metrics": 6972, "find_cached": false}
[2025-06-12T23:21:52.513+0300] ERROR [render] data_parser {"request_id": "968928b6d1eefe1cc2ce18b262bfa9a8", "error": "context canceled", "read_bytes": 0, "runtime": "0s", "runtime_ns": 0}
[2025-06-12T23:21:52.513+0300] INFO [http] access {"request_id": "968928b6d1eefe1cc2ce18b262bfa9a8", "time": 2.002144606, "method": "GET", "url": "/render/?format=pickle&target=rb.rb*.rbmon.*&from=1748882404&until=1749757204", "peer": "[::1]:60226", "client": "", "status": 504, "find_cached": false}
[2025-06-12T23:21:57.802+0300] ERROR [render] reader {"request_id": "968928b6d1eefe1cc2ce18b262bfa9a8", "error": "context canceled"}

But query for graphite_reverse table in clickhouse continues to run up to 1 minute and ends with clickhouse error "Code: 24. DB::Exception: Cannot write to ostream at offset 18170421".

I've done some debugging and found that graphite-clickhouse doesn't close connection to clickhouse when request was canceled (curl finishes running) and waits for data-timeout (in my case, data-timeout = "1m0s") and only then disconnects from clickhouse.

That leads to many orphans queries in clickhouse and spare load on it.

Could you please fix the issue or provide any workaround?

perederyaev avatar Jun 12 '25 20:06 perederyaev