elassandra icon indicating copy to clipboard operation
elassandra copied to clipboard

Inconsistent Data Querying ElasticSearch

Open ribeirodba opened this issue 2 years ago • 1 comments

Look this test I´ve performed in Elassandra with Python.

I created a function to query data using Cassandra driver:

def process_query_cassandra(query, fetch_size = 5000, consistency_level=ConsistencyLevel.LOCAL_ONE): start = timer() paging_state = None rows = [] while True: statement = SimpleStatement(query, fetch_size = fetch_size, consistency_level=consistency_level) results = session.execute(statement, paging_state=paging_state) paging_state = results.paging_state for row in results.current_rows: rows.append(row) if paging_state == None: break df = pd.DataFrame(rows) end = timer() return df, timedelta(seconds=end-start)

Table f0101 has 872390 rows.

When I query using CQL only, results are OK:

query1 = """ select * from "dlfinjdep"."f0101" ALLOW FILTERING """

Running Cassandra #1 (22-06-01 12:43) Rows: 872390 seconds: 0:03:17.609349 Running Cassandra #2 (22-06-01 12:46) Rows: 872390 seconds: 0:03:04.289089

However, when I use the option to query ElasticSearch index through CQL, I get different results:

query2 = """ select * from "dlfinjdep"."f0101" WHERE es_query='{"query":{"match_all":{}}}'
AND es_options='indices=dlfinjdep-f0101-index' ALLOW FILTERING """

Running Elastic #1 (22-06-01 12:50) Rows: 841350 seconds: 0:03:49.136313 Running Elastic #2 (22-06-01 12:54) Rows: 834372 seconds: 0:03:33.985948

ribeirodba avatar Jun 18 '22 09:06 ribeirodba

Which version of elassandra are you using ?

serversteam avatar Nov 11 '22 14:11 serversteam