redash icon indicating copy to clipboard operation
redash copied to clipboard

New Elasticsearch query runner

Open arikfr opened this issue 4 years ago • 6 comments

Our current Elasticsearch support is in a very bad state, it has issues with aggregated queries, nested objects, and new versions of Elasticsearch. Also the codebase is in a very poor state not allowing for easy fixes.

We are going to re-implement the connector, addressing these issues and making room for future improvements.

Implementation notes:

  • Unless there will be request for it we will only support Elasticsearch's API and not the Kibana flavor.
  • This will be a new query runner. The current ones will be marked as deprecated, but existing queries using them will keep using them.
  • We will have support for SQL queries on top of Elasticsearch. There seem to be two flavors of it: community version and the X-Pack version. We should support both. Worth considering having the different SQL connectors as separate types from the ones that use the API directly (but reuse code, of course).

Previous issues:

  • Aggregated queries issues: #2789, #3575, #3043
  • Incompatibility with newer versions: #1744
  • Other: #1596, #1075, #2791

Previous attempts at addressing these issues:

  • Support for elasticsearch-sql plugin: #2582
  • Another SQL support: #3393
  • Schema fetching errors: #3692
  • Aggregated results support: #3845
  • Nested fields support: #3456
  • Nested aggregations support: #3809

arikfr avatar Oct 27 '19 09:10 arikfr

Worth noting that the new implementation will at least have tests for the query parsing logic to make future changes easier.

arikfr avatar Oct 27 '19 09:10 arikfr

  1. Since the ES API is quite easily accessible with HTTP requests, I guess we don't want to add the dependency to a more specific Python ES client, keeping it the same way as the current ES query runner. Am I right?
  2. Do we want to support nested fields and nested aggregations the same way it's done for the MongoDB query runner? Nested objects get flattened and the name of the column uses a delimiter like . to indicate that it originally comes from a nested object. Note that this is the approach taken out of the box by the X-Pack SQL wrapper.
  3. Same questions regarding documents with different fields in the same collection. It seems that the MongoDB runner looks at the first and last documents and takes fields from there while X-Pack SQL shows all fields possible defined in the collection for each document (with the value null if the document does not have a specific field).

NicolasLM avatar Nov 06 '19 11:11 NicolasLM

  1. Yes, I had the same thoughts. From what I've seen for our needs the Elasticsearch clients don't offer much.

  2. Considering this is what X-Pack SQL does and it's consistent with what we do with other connectors, it makes sense to keep it. I would also consider reusing the code between Mongo and ES.

  3. What you described happens when we query MongoDB for schema. But in the query result we pick up columns from all the documents (see parse_results). We should follow similar pattern.

arikfr avatar Nov 06 '19 11:11 arikfr

Hi, can you describe the overall status of the new ES pipeline, and give us some timeline for its completeness/readiness for production. We are experience problems with ES queries and we're awaiting when new implementation will be in place.

vkuznet avatar Jul 08 '20 20:07 vkuznet

Same here, we're desperately waiting for the new ES connector. Is there any way I can contribute to getting it across the line?

itssimon avatar Jul 13 '20 02:07 itssimon

@susodapop @arikfr Please is this available in the latest stable docker tag: 10.1.0.b50633 ?

docker pull redash/redash:10.1.0.b50633

mavencode01 avatar Nov 21 '23 16:11 mavencode01