documentation-website
documentation-website copied to clipboard
[DOC] Misleading and unclear documentation for the Spark Connector in the SQL/PPL docs
What do you want to do?
- [x] Request a change to existing documentation
- [ ] Add new documentation
- [ ] Report a technical problem with the documentation
- [ ] Other
Tell us about your request. Regarding: https://opensearch.org/docs/latest/search-plugins/sql/settings/#spark-connector-settings
-
The Spark connector is, according to this comment only supporting AWS EMR Serverless Spark (which means I need to have AWS credentials). This should be made clear in the docs.
-
The docs lacks examples how to setup EMR Serverless Spark and OpenSearch and where to provide the configuration (like
spark.uri). For an user its unclear how to setup a basic working example. -
Some of the config properties lacks examples and the info which values are valid:
spark.uri"The identifier for your Spark data source." is misleading, lacks example and what the default is and wether its mandatoryspark.auth.typeIts unclear which values are valid and what the default is and wether its mandatory
-
The spark connector docs lacks an reference to https://opensearch.org/docs/latest/dashboards/management/data-sources/ (and potentially https://opensearch.org/docs/latest/dashboards/management/accelerate-external-data/) and an explanation and examples how to add spark as a datasource
-
The docs are not coherent with https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/admin/connectors/spark_connector.rst
emr.clusteris missing for example
-
The ppl example is unclear
POST /_plugins/_ppl
content-type: application/json
{
"query": "source = my_spark.sql('select * from alb_logs')"
}
To what is my_spark referring to?
Version: all since Spark connector is supported
What other resources are available?
- https://github.com/opensearch-project/opensearch-spark/pull/606#discussion_r1752113941
- https://github.com/opensearch-project/opensearch-spark/issues/4#issuecomment-1631451276
- https://github.com/opensearch-project/sql/issues/948#issue-1418627454
- https://github.com/opensearch-project/opensearch-spark/discussions/317
- https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/admin/datasources.rst
@salyh: Thanks for submitting this issue! I'll find a dev who can help make the changes you requested.
@YANG-DB @Naarcha-AWS any update? We need to clarify this to get https://github.com/opensearch-project/opensearch-spark/pull/606 done
Closing as stale