documentation-website icon indicating copy to clipboard operation
documentation-website copied to clipboard

[DOC] Misleading and unclear documentation for the Spark Connector in the SQL/PPL docs

Open salyh opened this issue 1 year ago • 1 comments

What do you want to do?

  • [x] Request a change to existing documentation
  • [ ] Add new documentation
  • [ ] Report a technical problem with the documentation
  • [ ] Other

Tell us about your request. Regarding: https://opensearch.org/docs/latest/search-plugins/sql/settings/#spark-connector-settings

  • The Spark connector is, according to this comment only supporting AWS EMR Serverless Spark (which means I need to have AWS credentials). This should be made clear in the docs.

  • The docs lacks examples how to setup EMR Serverless Spark and OpenSearch and where to provide the configuration (like spark.uri). For an user its unclear how to setup a basic working example.

  • Some of the config properties lacks examples and the info which values are valid:

    • spark.uri "The identifier for your Spark data source." is misleading, lacks example and what the default is and wether its mandatory
    • spark.auth.typeIts unclear which values are valid and what the default is and wether its mandatory
  • The spark connector docs lacks an reference to https://opensearch.org/docs/latest/dashboards/management/data-sources/ (and potentially https://opensearch.org/docs/latest/dashboards/management/accelerate-external-data/) and an explanation and examples how to add spark as a datasource

  • The docs are not coherent with https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/admin/connectors/spark_connector.rst

    • emr.cluster is missing for example
  • The ppl example is unclear

POST /_plugins/_ppl
content-type: application/json
{
   "query": "source = my_spark.sql('select * from alb_logs')"
}

To what is my_spark referring to?

Version: all since Spark connector is supported

What other resources are available?

  • https://github.com/opensearch-project/opensearch-spark/pull/606#discussion_r1752113941
  • https://github.com/opensearch-project/opensearch-spark/issues/4#issuecomment-1631451276
  • https://github.com/opensearch-project/sql/issues/948#issue-1418627454
  • https://github.com/opensearch-project/opensearch-spark/discussions/317
  • https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/admin/datasources.rst

salyh avatar Sep 11 '24 14:09 salyh

@salyh: Thanks for submitting this issue! I'll find a dev who can help make the changes you requested.

Naarcha-AWS avatar Sep 17 '24 12:09 Naarcha-AWS

[Catch All Triage - 1, 2, 3, 4]

dblock avatar Sep 30 '24 16:09 dblock

@YANG-DB @Naarcha-AWS any update? We need to clarify this to get https://github.com/opensearch-project/opensearch-spark/pull/606 done

salyh avatar Oct 07 '24 08:10 salyh

Closing as stale

natebower avatar Jul 14 '25 18:07 natebower