cube
cube copied to clipboard
ClickHouse crashes under concurrent queries
Problem
Getting error on clickhouse db logs
<Error> DynamicQueryHandler: Cannot send exception to client: Code: 24. DB::Exception: Cannot write to ostream at offset 1289. (CANNOT_WRITE_TO_OSTREAM), Stack trace (when copying this message, always include the lines below):
0. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0xb3ac1da in /usr/bin/clickhouse
1. DB::WriteBufferFromOStream::nextImpl() @ 0xb4b9a32 in /usr/bin/clickhouse
2. DB::WriteBufferFromHTTPServerResponse::nextImpl() @ 0x167ab677 in /usr/bin/clickhouse
3. DB::HTTPHandler::trySendExceptionToClient(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, DB::HTTPServerRequest&, DB::HTTPServerResponse&, DB::HTTPHandler::Output&) @ 0x167300c7 in /usr/bin/clickhouse
4. DB::HTTPHandler::handleRequest(DB::HTTPServerRequest&, DB::HTTPServerResponse&) @ 0x16731995 in /usr/bin/clickhouse
5. DB::HTTPServerConnection::run() @ 0x167a44fb in /usr/bin/clickhouse
6. Poco::Net::TCPServerConnection::start() @ 0x19a6608f in /usr/bin/clickhouse
7. Poco::Net::TCPServerDispatcher::run() @ 0x19a684e1 in /usr/bin/clickhouse
8. Poco::PooledThread::run() @ 0x19c25e69 in /usr/bin/clickhouse
9. Poco::ThreadImpl::runnableEntry(void*) @ 0x19c231c0 in /usr/bin/clickhouse
10. ? @ 0x7fb0db337609 in ?
11. __clone @ 0x7fb0db25c133 in ?
(version 22.3.20.29 (official build))
Error log in the CubeJS
Error: socket hang up
at QueryQueue.parseResult (/usr/src/app/node_modules/@cubejs-backend/query-orchestrator/src/orchestrator/QueryQueue.js:434:13)
at QueryQueue.executeQueryInQueue (/usr/src/app/node_modules/@cubejs-backend/query-orchestrator/src/orchestrator/QueryQueue.js:320:21)
at processTicksAndRejections (node:internal/process/task_queues:95:5)
at PreAggregationLoader.loadPreAggregationWithKeys (/usr/src/app/node_modules/@cubejs-backend/query-orchestrator/src/orchestrator/PreAggregations.ts:790:7)
at PreAggregationLoader.loadPreAggregation (/usr/src/app/node_modules/@cubejs-backend/query-orchestrator/src/orchestrator/PreAggregations.ts:573:22)
at preAggregationPromise (/usr/src/app/node_modules/@cubejs-backend/query-orchestrator/src/orchestrator/PreAggregations.ts:2149:30)
at QueryOrchestrator.fetchQuery (/usr/src/app/node_modules/@cubejs-backend/query-orchestrator/src/orchestrator/QueryOrchestrator.ts:241:9)
at OrchestratorApi.executeQuery (/usr/src/app/node_modules/@cubejs-backend/server-core/src/core/OrchestratorApi.ts:98:20)
at /usr/src/app/node_modules/@cubejs-backend/server-core/src/core/RefreshScheduler.ts:602:13
at async Promise.all (index 0)
at RefreshScheduler.refreshPreAggregations (/usr/src/app/node_modules/@cubejs-backend/server-core/src/core/RefreshScheduler.ts:587:5)
at async Promise.all (index 1)
at RefreshScheduler.runScheduledRefresh (/usr/src/app/node_modules/@cubejs-backend/server-core/src/core/RefreshScheduler.ts:285:9)
**Increased timeout still did not worked **
CUBEJS_DB_QUERY_TIMEOUT: 30s
Hey @ranjeetranjan 👋 Is this still a standing issue? If so, does it reproduce?
Yes @igorlukanin Yes still facing the same issues
Here you Go details.
- Data Size apx 35G.
- Data Query date range 1 year.
- Dashboard has apx total of 20 Cards/chart
- In some query get hang-up error
After another look at the error/data size/concurrency, I suspect that the issue is that ClickHouse doesn't handle the load well and the error is related to ClickHouse rather than Cube. It looks like your ClickHouse instance crashes and then Cube gets no response. Would you kindly try raising an issue with the ClickHouse team? https://github.com/ClickHouse/ClickHouse
Hi @igorlukanin ! Hope you are well. Based on the log message, ClickHouse thinks the client went away.
<Error> DynamicQueryHandler: Cannot send exception to client: Code: 24. DB::Exception: Cannot write to ostream at offset 1289. (CANNOT_WRITE_TO_OSTREAM), Stack trace (when copying this message, always include the lines below):
This appears when a client disconnects from ClickHouse and we cannot write results to the network. This can happen for a variety of reasons. Here are a few that come to mind.
- Keepalive is not enabled on the TCP connection.
- The connection goes through a load balancer and the load balancer dropped the connection, perhaps due to a timeout.
- The application received an error and disconnected.
Is it possible to capture the query that is failing and run it by hand using curl? That would be instructive to see if there's another error at work. Here's an example:
time curl http://localhost:8123?query=select+1
1
real 0m0.019s
user 0m0.009s
sys 0m0.004s
@hodgesrm Thanks for your views.
The connection goes through a load balancer and the load balancer dropped the connection, perhaps due to a timeout.
We are not using any load balancer in our case. The worker node directly connects to the clickhouse pod within the Kubernetes.
Based on a suggestion I run the query where the application gets a timeout.
time curl --user user:pass 'http://clickhouse-host:8123?query=SELECT%20max%28timestamp%29%20FROM%20db_name.example_table%20WHERE%20toDate%28timestamp%29%20BETWEEN%20%272023-03-01%27%20AND%20%272024-04-04%27'
real 0m4.912s
user 0m0.008s
sys 0m0.000s
@igorlukanin Your response will be highly appreciated.
@ranjeetranjan What is your CUBEJS_CONCURRENCY
env var setting? Could you please check what happens if you increase or decrease it.
@igorlukanin Thanks for your response. Did not set any CUBEJS_CONCURRENCY
.
Could you please check what happens if you increase or decrease it?
I tried increasing and decreasing but nothing worked. When I tried with the Cube cloud it worked perfectly with the same DB and schema. We did not find any issues.
Here you go my .env configuration of my local development.
##Timeout
CUBESTORE_QUERY_TIMEOUT=60s
CUBEJS_DB_QUERY_TIMEOUT=10m
#Other
NODE_OPTIONS="--max-old-space-size=12000"
I am missing something minor.