embucket-labs icon indicating copy to clipboard operation
embucket-labs copied to clipboard

[BUG] Internal DataFusion error on unary minus with Decimal types causes service hang and client timeouts during SLT run

Open YevheniiNiestierov opened this issue 5 months ago • 0 comments

The SLT runner gets stuck and eventually times out on a specific query involving a unary minus operator. Analysis of the Embucket logs reveals that this is caused by a critical internal DataFusion error, which appears to crash or hang the database service, leading to client connection failures.

The runner hangs on the following query:

SELECT * FROM tab3 AS cor0 WHERE NOT col3 * col1 * - 56 IS NOT NULL;

Runner Timeout Error:

250003: Failed to execute request: HTTPConnectionPool(host='localhost', port=3000): Read timed out. (read timeout=60)

Runner Connection Refused Error (on retry):

250003: Failed to execute request: HTTPConnectionPool(host='localhost', port=3000): Max retries exceeded with url: /queries/v1/query-request?requestId=abfad778-af58-453f-a46f-7d49ba0577e9 (Caused by NewConnectionError('<snowflake.connector.vendored.urllib3.connection.HTTPConnection object at 0x10b1bcfb0>: Failed to establish a new connection: [Errno 61] Connection refused'))

The Embucket logs show a fatal internal error in DataFusion when trying to process the unary minus (-) in the query. DataFusion itself identifies this as a bug.

Critical Internal Error in Embucket:

{"timestamp":"2025-07-03T19:22:02.114597Z","level":"ERROR","fields":{"error":"DataFusion error: Internal error: Can not run arithmetic negative on scalar value Decimal128(None,38,10).\nThis was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker"},"target":"core_executor::query"}

This error propagates up through the service layers:

{"timestamp":"2025-07-03T19:22:02.191492Z","level":"ERROR","fields":{"message":"DataFusion error: Internal error: Can not run arithmetic negative on scalar value Decimal128(None,38,10).\nThis was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker\n0: <transparent>\n1: DataFusion error: Internal error: Can not run arithmetic negative on scalar value Decimal128(None,38,10).\nThis was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker, at crates/core-executor/src/query.rs:1443:22\n2: Internal(\"Can not run arithmetic negative on scalar value Decimal128(None,38,10)\")"},"target":"api_snowflake_rest::error","span":{"name":"api-snowflake-rest::Error::into_response"},"spans":[{"name":"api-snowflake-rest::Error::into_response"}]}

Finally, telemetry logs confirm the service became unavailable:

{"timestamp":"2025-07-03T19:22:03.968359Z","level":"ERROR","fields":{"message":"","name":"BatchSpanProcessor.ExportError","error":"Operation failed: status: Unavailable, message: \"tcp connect error\", details: [], metadata: MetadataMap { headers: {} }"},"target":"opentelemetry_sdk"}

YevheniiNiestierov avatar Jul 04 '25 10:07 YevheniiNiestierov