elasticsearch-dbapi
elasticsearch-dbapi copied to clipboard
SuperSet errors with AWS Elasticsearch [Kibana version 6.8.0]
Hello preset-io team,
I am facing some issues with using elasticsearch-dbapi version: 0.2.6. I have described the issue in detail in the apache superset project too. SuperSet errors with AWS Elasticsearch [Kibana version 6.8.0]
To summarize:
Issue 1
-
elasticsearch-dbapi seems to be not able to parse the index metadata from AWS ES endpoint. Issue happening due to https://github.com/preset-io/elasticsearch-dbapi/blob/master/es/opendistro/api.py#L236
Sample request made to ES:
curl --location --request GET 'https://vpc-some-search-domain.us-west-2.es.amazonaws.com:443/<INDEX- NAME>/_mapping?format=json'
Here's the sample response coming from ES endpoint:
{"dummy_app":{"mappings":{"dummy_app":{"properties":{"param1":{"type":"boolean"},"param2": {"type":"float"},"param3":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"param4": {"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"param5":{"type":"text","fields":{"keyword": {"type":"keyword","ignore_above":256}}},"param6":{"type":"date"},"deleted":{"type":"boolean"},"event_time": {"type":"date"},"gwGen":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}}}}}}}
Issue 2
-
elasticsearch-dbapi cannot parse response from ES endpoint for a SQL query, getting error "Missing columns field, maybe it's an elastic sql ep" es.exceptions.DataError: Missing columns field, maybe it's an elastic sql ep. Issue happening due to https://github.com/preset-io/elasticsearch-dbapi/blob/master/es/opendistro/api.py#L282
The sample curl command returns valid response from ES, however, possibly the query is not properly formed (it is missing
?format=jdbc
):curl --location --request POST 'https://vpc-some-search-domain.us-west-2.es.amazonaws.com:443/_opendistro/_sql' \ --header 'Content-Type: application/json' \ --data-raw '{ "query": "SELECT * FROM dummy_index where some_id = 1 order by id DESC limit 1" }'
Sample response from ES endpoint for the above rest call:
{"took":4,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits": {"total":43464,"max_score":null,"hits":[{"_index":"dummy_index","_type":"dummy_index","_id":"dummy_index-app_1636096855000","_score":null,"_source": {"param1":"value1","param2":false,"param3":"1636096855000","param4":"2021-11-05T07:21:01.439Z"},"sort":[1636096861439]}]}}
Environment
- browser type and version: Google Chrome [Version 95.0.4638.69 (Official Build) (x86_64)]
- superset version:
1.3.1
- python version:
3.7
- node.js version:
node -v
- any feature flags active: None
- pip elasticsearch-dbapi version:
0.2.6
- pip elasticsearch version:
7.13.4
- Kibana version:
6.8.0
Any help from you is greatly appreciated.
Please see here for the response formats for the Elasticsearch Opendistro SQL.
By default the plugin returns original response from Elasticsearch in JSON. Because this is the native response from Elasticsearch, extra efforts are needed to parse and interpret it.
I am pretty certain that the second issue is happening because ?format=jdbc
is not set on the es query endpoint.
See here the path is set to _opendistro/_sql
instead it should be _opendistro/_sql?format=jdbc
However, the code assumes (See lines 280 and 281) the response in jdbc format. Hence, the error is happening in parsing the response. Can someone please fix this?
@dpgaspar: I am tagging you since, you are an active member of this project. Thanks in advance!
In case you need the following:
pip freeze
pip freeze
aiohttp==3.7.2
alembic==1.4.3
amqp==2.6.1
# Editable install with no version control (apache-superset==1.3.1)
-e /app
apispec==3.3.2
async-timeout==3.0.1
attrs==20.2.0
Authlib==0.15.5
Babel==2.8.0
backoff==1.10.0
billiard==3.6.3.0
bleach==3.3.0
boto3==1.16.10
botocore==1.19.10
Brotli==1.0.9
cached-property==1.5.2
cachelib==0.1.1
cachetools==4.2.4
celery==4.4.7
certifi==2020.6.20
cffi==1.14.3
chardet==3.0.4
click==7.1.2
colorama==0.4.4
configparser==5.0.2
convertdate==2.3.0
cron-descriptor==1.2.24
croniter==0.3.36
cryptography==3.3.2
defusedxml==0.6.0
Deprecated==1.2.11
deprecation==2.1.0
dnspython==2.0.0
elasticsearch==7.13.4
elasticsearch-dbapi==0.2.6
email-validator==1.1.1
et-xmlfile==1.0.1
Flask==1.1.2
Flask-AppBuilder==3.3.0
Flask-Babel==1.0.0
Flask-Caching==1.10.1
Flask-Compress==1.8.0
Flask-Cors==3.0.9
Flask-JWT-Extended==3.24.1
Flask-Login==0.4.1
Flask-Migrate==2.5.3
Flask-OpenID==1.3.0
Flask-SQLAlchemy==2.4.4
flask-talisman==0.7.0
Flask-WTF==0.14.3
future==0.18.2
geographiclib==1.50
geopy==2.0.0
google-api-core==2.1.1
google-api-python-client==2.27.0
google-auth==2.3.1
google-auth-httplib2==0.1.0
googleapis-common-protos==1.53.0
graphlib-backport==1.0.3
gunicorn==20.0.4
holidays==0.10.3
httplib2==0.20.1
humanize==3.1.0
idna==2.10
idps-client==3.94.0
idps-mgmt-swagger-client==3.94.0
idps-sdk==3.94.0
idps-swagger-client==3.94.0
ijson==3.1.2.post0
importlib-metadata==4.8.1
isodate==0.6.0
itsdangerous==1.1.0
jdcal==1.4.1
Jinja2==2.11.3
jmespath==0.10.0
jsonlines==1.2.0
jsonschema==3.2.0
kombu==4.6.11
korean-lunar-calendar==0.2.1
linear-tsv==1.1.0
machina-swagger-client==3.94.0
Mako==1.1.3
Markdown==3.3.3
MarkupSafe==1.1.1
marshmallow==3.9.0
marshmallow-enum==1.5.1
marshmallow-sqlalchemy==0.23.1
msgpack==1.0.0
multidict==5.0.0
mysqlclient==1.4.2.post1
natsort==7.0.1
numpy==1.19.4
openpyxl==3.0.5
packaging==21.0
pandas==1.2.2
parsedatetime==2.6
pgsanity==0.2.9
Pillow==7.2.0
pks-swagger-client==3.94.0
polyline==1.4.0
prison==0.1.3
progress==1.5
protobuf==3.19.0
psycopg2==2.8.5
psycopg2-binary==2.8.5
pyarrow==4.0.1
pyasn1==0.4.8
pyasn1-modules==0.2.8
pyathena==2.3.0
pycparser==2.20
pycryptodome==3.11.0
pydruid==0.6.1
PyGithub==1.54.1
PyHive==0.6.3
PyJWT==1.7.1
PyMeeus==0.3.7
pyparsing==2.4.7
pyrsistent==0.16.1
python-dateutil==2.8.1
python-dotenv==0.15.0
python-editor==1.0.4
python-geohash==0.8.5
python3-openid==3.2.0
pytz==2020.4
PyYAML==5.4.1
rcc-swagger-client==3.94.0
redis==3.2.1
requests==2.24.0
requests-aws4auth==1.1.1
rfc3986==1.4.0
rsa==4.7.2
s3transfer==0.3.3
sasl==0.2.1
secrecy-swagger-client==3.94.0
selenium==3.141.0
simplejson==3.17.2
six==1.15.0
slackclient==2.5.0
SQLAlchemy==1.3.20
SQLAlchemy-Utils==0.36.8
sqlparse==0.3.0
tableschema==1.20.0
tabulate==0.8.9
tabulator==1.52.5
tenacity==8.0.1
thrift==0.13.0
thrift-sasl==0.4.2
typing-extensions==3.7.4.3
unicodecsv==0.14.1
uritemplate==4.1.1
urllib3==1.25.11
vine==1.3.0
webencodings==0.5.1
Werkzeug==1.0.1
wrapt==1.12.1
WTForms==2.3.3
WTForms-JSON==0.3.3
xlrd==1.2.0
yarl==1.6.2
zipp==3.4.1
Hi @harshgadhia,
What's your AWS OpenSearch (ElasticSearch) version? 6.8.0?
@dpgaspar Yes that is correct:
{
"name": "lagQbOg",
"cluster_name": "332592635208:metrics-search-dev",
"cluster_uuid": "2OAhfNl3S7aumuuBxQcxCQ",
"version": {
"number": "6.8.0",
"build_flavor": "oss",
"build_type": "zip",
"build_hash": "8169a24",
"build_date": "2021-04-21T19:26:55.782637Z",
"build_snapshot": false,
"lucene_version": "7.7.0",
"minimum_wire_compatibility_version": "5.6.0",
"minimum_index_compatibility_version": "5.0.0"
},
"tagline": "You Know, for Search"
}
Here's the latest information I have:
The Opensearch (aka elasticsearch) has support to query SQL via the _opendistro/_sql
endpoint since version >=6.5.
However, only after version >=7.4, the endpoint presumable sends response in jdfc format (that is what must be the assumption I guess, when writing this.
However, for older versions <7.4, you have to explicitly pass on a query parameter ?format=jdbc
. Looks like doing this will make the library compatible for the older versions.
Looking forward for your help in this matter, thanks in advance.
@dpgaspar Did you get a chance to look at this? Really appreciate a response on the above. Thank you in advance!
sounds reasonable to me @harshgadhia are you willing to make a PR? We can fetch current cluster version, or create a new connection string parameter
@dpgaspar sure, I can try looking into it, and send a PR your way!
@harshgadhia,
I am new to superset and I am facing same issue with AWS ES. I am not sure how we can pass query parameter ?format=jdbc
in _opendistro/_sql
in superset