dataproc-templates
dataproc-templates copied to clipboard
Dataproc templates and pipelines for solving simple in-cloud data tasks
ref. [[Spike] Explore the usage of PubSub Lite Spark Connector for Python template](https://github.com/GoogleCloudPlatform/dataproc-templates/issues/594)
Enable **mssql-to-postgres-notebook** to be executed from the command line. 1. Create a script like [this](https://github.com/GoogleCloudPlatform/dataproc-templates/blob/main/notebooks/mysql2spanner/MySqlToSpanner_parameterize_script.py) to execute the notebook with command line arguments. 2. Update the notebook to use `IS_PARAMETERIZED`...
Enhance readme files with common issues and solutions for them that we are seeing and fixing for users
As we control parallelism via `numPartitions` we do not want Oracle to then run Spark queries in parallel. It would be really easy to overload a source system if this...
Enable **OracleToSpanner_notebook** to be executed from the command line. 1. Create a script like [this](https://github.com/GoogleCloudPlatform/dataproc-templates/blob/main/notebooks/mysql2spanner/MySqlToSpanner_parameterize_script.py) to execute the notebook with command line arguments. 2. Update the notebook to use `IS_PARAMETERIZED`...
Update Spanner templates for GoogleSQL vs PG Interface