pipeline function in QueryCache fails with ECONNRESET for lambda rollup pre-aggregations

Open viktordebulat opened this issue 7 months ago • 1 comments

Describe the bug We are encountering an issue with pipeline in QueryCache.ts when processing lambda rollup pre-aggregations for ClickHouse. Specifically, the error occurs during the streaming of data from tableData.rowStream to the writer. The error is intermittent but consistently happens for certain request types after approximately 2 seconds:

Error: aborted
    at TLSSocket.socketCloseListener (node:_http_client:478:19)
    at TLSSocket.emit (node:events:530:35)
    at node:net:351:12
    at TCP.done (node:_tls_wrap:650:7) {
  code: 'ECONNRESET'

The problem goes away when replacing affected code with direct iterator processing:

const iterator = tableData.rowStream[Symbol.asyncIterator]();
let result = await iterator.next();
while (!result.done) {
  writer.write(result.value);
  result = await iterator.next();
}
writer.end();

This workaround resolves the issue, but it bypasses the pipeline utility, which is designed to handle stream piping and error propagation.

To Reproduce Steps to reproduce the behavior:

Create cube with ClickHouse as datasource, declare rollups and lambda rollups in preaggregations section.
Trigger a request that processes a large dataset or involves a lambda running.

Expected behavior The pipeline function should handle the streaming of data without prematurely aborting due to ECONNRESET.

Minimally reproducible Cube Schema

cubes:
  - name: cube_total
    sql: >
       some select sql here

    measures:
      - name: total_transactions
        sql: transaction_id
        type: count

      - name: total_amount
        sql: amount
        type: sum

      - name: total_payout
        sql: payout
        type: sum

    dimensions:
      - name: user_id
        sql: user_id
        type: string

      - name: currency
        sql: currency
        type: string

      - name: at
        sql: at
        type: time

    pre_aggregations:
      - name: cube_total_rollup_lambda
        type: rollup_lambda
        union_with_source_data: true
        rollups:
          - CUBE.cube_total_rollup

      - name: cube_total_rollup
        type: rollup
        measures:
          - cube_total.total_transactions
          - cube_total.total_amount
          - cube_total.total_payout
        dimensions:
          - cube_total.user_id
          - cube_total.currency
        indexes:
          - name: user_rollup_user_id_index
            columns:
              - cube_total.user_id
        time_dimension: cube_total.at
        granularity: quarter
        external: true
        partition_granularity: quarter
        refresh_key:
          every: 1 day

Version: Cube: 1.3.10, 1.3.11, 1.3.12... ClickHouse: 25.3

Additional context The issue occurs specifically for lambda rollup pre-aggregations. Other request types using the same pipeline function do not exhibit this behavior. This suggests the issue may be related to the characteristics of the tableData.rowStream for these specific requests.

I suggest, there should be improvements in error handling for pipeline and/or, probably some retries.

May 13 '25 16:05 viktordebulat

This blocked me from starting using lambda rollups for ClickHouse (haven't tested for other DBs). Meanwhile, had to build custom image to unblock.

Asking for advise on proper resolution of this issue or suggestions while it might fail.

May 13 '25 16:05 viktordebulat