elasticsearch-dbapi icon indicating copy to clipboard operation
elasticsearch-dbapi copied to clipboard

SuperSet errors with AWS Elasticsearch [Kibana version 6.8.0]

Open harshgadhia opened this issue 3 years ago • 9 comments

Hello preset-io team,

I am facing some issues with using elasticsearch-dbapi version: 0.2.6. I have described the issue in detail in the apache superset project too. SuperSet errors with AWS Elasticsearch [Kibana version 6.8.0]

To summarize:

Issue 1

  • elasticsearch-dbapi seems to be not able to parse the index metadata from AWS ES endpoint. Issue happening due to https://github.com/preset-io/elasticsearch-dbapi/blob/master/es/opendistro/api.py#L236

    Sample request made to ES:

    curl --location --request GET 'https://vpc-some-search-domain.us-west-2.es.amazonaws.com:443/<INDEX- 
    NAME>/_mapping?format=json'
    

    Here's the sample response coming from ES endpoint:

    {"dummy_app":{"mappings":{"dummy_app":{"properties":{"param1":{"type":"boolean"},"param2": 
    {"type":"float"},"param3":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"param4": 
    {"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"param5":{"type":"text","fields":{"keyword": 
    {"type":"keyword","ignore_above":256}}},"param6":{"type":"date"},"deleted":{"type":"boolean"},"event_time": 
    {"type":"date"},"gwGen":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}}}}}}}
    

Issue 2

  • elasticsearch-dbapi cannot parse response from ES endpoint for a SQL query, getting error "Missing columns field, maybe it's an elastic sql ep" es.exceptions.DataError: Missing columns field, maybe it's an elastic sql ep. Issue happening due to https://github.com/preset-io/elasticsearch-dbapi/blob/master/es/opendistro/api.py#L282

    The sample curl command returns valid response from ES, however, possibly the query is not properly formed (it is missing ?format=jdbc):

    curl --location --request POST 'https://vpc-some-search-domain.us-west-2.es.amazonaws.com:443/_opendistro/_sql' 
    \
    --header 'Content-Type: application/json' \
    --data-raw '{
     "query": "SELECT * FROM dummy_index where some_id = 1 order by id DESC limit 1"
    }'
    
    

    Sample response from ES endpoint for the above rest call:

    {"took":4,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits": 
    {"total":43464,"max_score":null,"hits":[{"_index":"dummy_index","_type":"dummy_index","_id":"dummy_index-app_1636096855000","_score":null,"_source": 
    {"param1":"value1","param2":false,"param3":"1636096855000","param4":"2021-11-05T07:21:01.439Z"},"sort":[1636096861439]}]}}
    

Environment

  • browser type and version: Google Chrome [Version 95.0.4638.69 (Official Build) (x86_64)]
  • superset version: 1.3.1
  • python version: 3.7
  • node.js version: node -v
  • any feature flags active: None
  • pip elasticsearch-dbapi version: 0.2.6
  • pip elasticsearch version: 7.13.4
  • Kibana version: 6.8.0

Any help from you is greatly appreciated.

harshgadhia avatar Nov 05 '21 00:11 harshgadhia

Please see here for the response formats for the Elasticsearch Opendistro SQL.

By default the plugin returns original response from Elasticsearch in JSON. Because this is the native response from Elasticsearch, extra efforts are needed to parse and interpret it.

I am pretty certain that the second issue is happening because ?format=jdbc is not set on the es query endpoint. See here the path is set to _opendistro/_sql instead it should be _opendistro/_sql?format=jdbc

However, the code assumes (See lines 280 and 281) the response in jdbc format. Hence, the error is happening in parsing the response. Can someone please fix this?

@dpgaspar: I am tagging you since, you are an active member of this project. Thanks in advance!

harshgadhia avatar Nov 05 '21 08:11 harshgadhia

In case you need the following:

pip freeze

 pip freeze
aiohttp==3.7.2
alembic==1.4.3
amqp==2.6.1
# Editable install with no version control (apache-superset==1.3.1)
-e /app
apispec==3.3.2
async-timeout==3.0.1
attrs==20.2.0
Authlib==0.15.5
Babel==2.8.0
backoff==1.10.0
billiard==3.6.3.0
bleach==3.3.0
boto3==1.16.10
botocore==1.19.10
Brotli==1.0.9
cached-property==1.5.2
cachelib==0.1.1
cachetools==4.2.4
celery==4.4.7
certifi==2020.6.20
cffi==1.14.3
chardet==3.0.4
click==7.1.2
colorama==0.4.4
configparser==5.0.2
convertdate==2.3.0
cron-descriptor==1.2.24
croniter==0.3.36
cryptography==3.3.2
defusedxml==0.6.0
Deprecated==1.2.11
deprecation==2.1.0
dnspython==2.0.0
elasticsearch==7.13.4
elasticsearch-dbapi==0.2.6
email-validator==1.1.1
et-xmlfile==1.0.1
Flask==1.1.2
Flask-AppBuilder==3.3.0
Flask-Babel==1.0.0
Flask-Caching==1.10.1
Flask-Compress==1.8.0
Flask-Cors==3.0.9
Flask-JWT-Extended==3.24.1
Flask-Login==0.4.1
Flask-Migrate==2.5.3
Flask-OpenID==1.3.0
Flask-SQLAlchemy==2.4.4
flask-talisman==0.7.0
Flask-WTF==0.14.3
future==0.18.2
geographiclib==1.50
geopy==2.0.0
google-api-core==2.1.1
google-api-python-client==2.27.0
google-auth==2.3.1
google-auth-httplib2==0.1.0
googleapis-common-protos==1.53.0
graphlib-backport==1.0.3
gunicorn==20.0.4
holidays==0.10.3
httplib2==0.20.1
humanize==3.1.0
idna==2.10
idps-client==3.94.0
idps-mgmt-swagger-client==3.94.0
idps-sdk==3.94.0
idps-swagger-client==3.94.0
ijson==3.1.2.post0
importlib-metadata==4.8.1
isodate==0.6.0
itsdangerous==1.1.0
jdcal==1.4.1
Jinja2==2.11.3
jmespath==0.10.0
jsonlines==1.2.0
jsonschema==3.2.0
kombu==4.6.11
korean-lunar-calendar==0.2.1
linear-tsv==1.1.0
machina-swagger-client==3.94.0
Mako==1.1.3
Markdown==3.3.3
MarkupSafe==1.1.1
marshmallow==3.9.0
marshmallow-enum==1.5.1
marshmallow-sqlalchemy==0.23.1
msgpack==1.0.0
multidict==5.0.0
mysqlclient==1.4.2.post1
natsort==7.0.1
numpy==1.19.4
openpyxl==3.0.5
packaging==21.0
pandas==1.2.2
parsedatetime==2.6
pgsanity==0.2.9
Pillow==7.2.0
pks-swagger-client==3.94.0
polyline==1.4.0
prison==0.1.3
progress==1.5
protobuf==3.19.0
psycopg2==2.8.5
psycopg2-binary==2.8.5
pyarrow==4.0.1
pyasn1==0.4.8
pyasn1-modules==0.2.8
pyathena==2.3.0
pycparser==2.20
pycryptodome==3.11.0
pydruid==0.6.1
PyGithub==1.54.1
PyHive==0.6.3
PyJWT==1.7.1
PyMeeus==0.3.7
pyparsing==2.4.7
pyrsistent==0.16.1
python-dateutil==2.8.1
python-dotenv==0.15.0
python-editor==1.0.4
python-geohash==0.8.5
python3-openid==3.2.0
pytz==2020.4
PyYAML==5.4.1
rcc-swagger-client==3.94.0
redis==3.2.1
requests==2.24.0
requests-aws4auth==1.1.1
rfc3986==1.4.0
rsa==4.7.2
s3transfer==0.3.3
sasl==0.2.1
secrecy-swagger-client==3.94.0
selenium==3.141.0
simplejson==3.17.2
six==1.15.0
slackclient==2.5.0
SQLAlchemy==1.3.20
SQLAlchemy-Utils==0.36.8
sqlparse==0.3.0
tableschema==1.20.0
tabulate==0.8.9
tabulator==1.52.5
tenacity==8.0.1
thrift==0.13.0
thrift-sasl==0.4.2
typing-extensions==3.7.4.3
unicodecsv==0.14.1
uritemplate==4.1.1
urllib3==1.25.11
vine==1.3.0
webencodings==0.5.1
Werkzeug==1.0.1
wrapt==1.12.1
WTForms==2.3.3
WTForms-JSON==0.3.3
xlrd==1.2.0
yarl==1.6.2
zipp==3.4.1

harshgadhia avatar Nov 05 '21 08:11 harshgadhia

Hi @harshgadhia,

What's your AWS OpenSearch (ElasticSearch) version? 6.8.0?

dpgaspar avatar Nov 05 '21 12:11 dpgaspar

@dpgaspar Yes that is correct:

{
    "name": "lagQbOg",
    "cluster_name": "332592635208:metrics-search-dev",
    "cluster_uuid": "2OAhfNl3S7aumuuBxQcxCQ",
    "version": {
        "number": "6.8.0",
        "build_flavor": "oss",
        "build_type": "zip",
        "build_hash": "8169a24",
        "build_date": "2021-04-21T19:26:55.782637Z",
        "build_snapshot": false,
        "lucene_version": "7.7.0",
        "minimum_wire_compatibility_version": "5.6.0",
        "minimum_index_compatibility_version": "5.0.0"
    },
    "tagline": "You Know, for Search"
}

harshgadhia avatar Nov 05 '21 19:11 harshgadhia

Here's the latest information I have:

The Opensearch (aka elasticsearch) has support to query SQL via the _opendistro/_sql endpoint since version >=6.5. However, only after version >=7.4, the endpoint presumable sends response in jdfc format (that is what must be the assumption I guess, when writing this.

However, for older versions <7.4, you have to explicitly pass on a query parameter ?format=jdbc. Looks like doing this will make the library compatible for the older versions.

Looking forward for your help in this matter, thanks in advance.

harshgadhia avatar Nov 05 '21 22:11 harshgadhia

@dpgaspar Did you get a chance to look at this? Really appreciate a response on the above. Thank you in advance!

harshgadhia avatar Nov 10 '21 18:11 harshgadhia

sounds reasonable to me @harshgadhia are you willing to make a PR? We can fetch current cluster version, or create a new connection string parameter

dpgaspar avatar Nov 10 '21 21:11 dpgaspar

@dpgaspar sure, I can try looking into it, and send a PR your way!

harshgadhia avatar Nov 13 '21 00:11 harshgadhia

@harshgadhia, I am new to superset and I am facing same issue with AWS ES. I am not sure how we can pass query parameter ?format=jdbc in _opendistro/_sql in superset

prijopauly avatar Dec 12 '23 09:12 prijopauly