cdp-spark-datasource icon indicating copy to clipboard operation
cdp-spark-datasource copied to clipboard

Create a complete introductory example

Open halvard-cognite opened this issue 5 years ago • 4 comments

The current documentation takes a lot of knowledge for granted.

I would love it if this repo had a complete example with instructions that I could compile and run somewhere (dataproc for example).

Might build on/copy this: https://cloud.google.com/dataproc/docs/tutorials/spark-scala

halvard-cognite avatar May 24 '19 21:05 halvard-cognite

Thanks for the feedback! You're right, the tutorials are narrowed down to using the data source from an already set up Spark cluster with the library available etc.

Just so I understand correctly - you're requesting a more thorough step-by-step guide for building and deploying Spark with the datasource available, or does this apply to the read/write examples as well?

hakontro avatar May 27 '19 06:05 hakontro

The read/write examples are probably fine.

I'm guessing I'm not going to be the last person with no experience using Spark and Scala that will show up at this repo and want to test something out with data from CDF.

halvard-cognite avatar May 27 '19 11:05 halvard-cognite

As in a tutorial more about setting things up (installing the data source in Dataproc or other clusters) rather than the usage of it as explained in https://github.com/cognitedata/notebook-examples/blob/master/spark/tutorials/Cognite%20Spark%20data%20source%20tutorial.ipynb ?

wjoel avatar May 27 '19 18:05 wjoel

To be fair I did not find the tutorials until after posting this issue and talking to Emil. But yes, for me getting a minimal code sample running was the big challenge. Dependency mgmt in the ecosystem etc.

halvard-cognite avatar May 31 '19 08:05 halvard-cognite