drill
drill copied to clipboard
Unexpected behavior with Cassandra connector
Describe the bug We are using Apache Drill to add ANSI SQL capabilities to cassandra, but when using the '>', '<' or 'IN' operators when filtering data, the query plan switch from a CassandraFilter to a regular Filter, meaning all the cassandra table data is scanned, fetched, then filtered, which is not the expected behavior, as the Apache Calcite plugin supports those operators. This results in very slow queries, and high resources consumptions.
Screenshots
(boitier_id, libelle, unite and periode are keys)
Expected behavior (using a CassandraFilter)

Unexpected behavior (When using lt, gt operator)

Possible solution to get a correct behavior, but not completely, as the whole dataset for (76, '3dProd_C1','W') is loaded insted of just the portion which we would like to use

Expected behavior The Query should use a CassandraFilter in order to fetch data efficiently, even when using '>', '<' operators, and not use a normal filter, which requires to fetch all the data from the queried table.
Should our question not belong here, feel free to remove it, but please point us to where we could ask it.
@CarusoGuillaume Thanks for reporting this. One thing... we recently merged a PR which updated the Drill query planner and likely includes updates to the Cassandra adapter. I'd be curious if trying this query with the current master of Drill 2.0 has any improvement.
@cgivre Is the update available for docker ? The latest tag pulls the 1.20.2 version. And in this version, it still performs a full table scan when using a cassandra operator that should result in a cassandra filter.
@CarusoGuillaume he'll be referring to the snaphot builds from the master branch. Look for Docker Hub tags starting with "master"
Hi ! The issue is still here in Apache Drill 2.0.