Shashank Agarwal
Shashank Agarwal
Update Spanner templates for GoogleSQL vs PG Interface
JDBCToJDBC template need to allow list of primary key column names when tables are auto generated. Example config from GCSToSpanner template https://github.com/GoogleCloudPlatform/dataproc-templates/blob/main/java/src/main/java/com/google/cloud/dataproc/templates/gcs/GCSToSpanner.java#L70 Also update the corresponding Notebooks (like MSSQLToPostgres)
For RDBMS, hardening goal can be 100GB
Test if following scenarios work with GCSToSpanner template. You may use any of the template (JDBCToGCS, BQToGCS etc) to generate test data in GCS. 1. Test for appending data into...
Document to describe how to debug and/or scale templates.
All notebooks have hardcoded working directory. It needs to be dynamically computed as user can potentially checkout in a different sub-directory.
Test with 5 TB of data in source Make changes to template code as necessary to make it work
https://cloud.google.com/dataproc-serverless/docs/concepts/versions/dataproc-serverless-versions#supported-dataproc-serverless-for-spark-runtime-versions Upgrade Dataproc Serverless runtime version from 1.1 to 1.2 Needs to be done for both Java and Python.