DataflowTemplates icon indicating copy to clipboard operation
DataflowTemplates copied to clipboard

Support dumping multiple Spanner databases to Avro

Open CAFxX opened this issue 6 years ago • 4 comments

To be able to use Cloud Scheduler effectively with the Spanner->Avro template, it would be ideal if https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/master/src/main/java/com/google/cloud/teleport/spanner/ExportPipeline.java allowed specifying multiple Database IDs (instead of a single one, as happens currently)

The current template already creates a subdirectory for the exported database in the GCS output directory: if multiple databases were specified multiple subdirectories would be created, one for each database.

As an extension, it would be very useful even to make the Database ID optional, in which case the dataflow would have to enumerate the databases in the specified Spanner instance, and then export all of them.

The goal is to be able to trigger an export of one, multiple or all databases on a spanner instance from a cloud scheduler job.

CAFxX avatar Jun 03 '19 01:06 CAFxX

Agreed that this would be very useful. However due to the nature of dataflow templates, job graph cannot be changed once the template is built. Which means that it would need more work on the template feature side to be able to support that.

We are working on something which would remove the limitation mentioned above. Will revisit this soon.

azurezyq avatar Jun 06 '19 21:06 azurezyq

@azurezyq thanks for the reply.

However due to the nature of dataflow templates, job graph cannot be changed once the template is built.

Just to confirm: does this apply even if the databases are exported serially?

Just FTR I also filed the same request via enterprise support: https://console.cloud.google.com/support/cases/detail/19577487?folder&organizationId=956776603191

CAFxX avatar Jun 11 '19 03:06 CAFxX

This issue has been marked as stale due to 180 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the issue at any time. Thank you for your contributions.

github-actions[bot] avatar Jun 13 '24 02:06 github-actions[bot]

Issue isn't solved yet

CAFxX avatar Jun 13 '24 04:06 CAFxX

This issue has been marked as stale due to 180 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the issue at any time. Thank you for your contributions.

github-actions[bot] avatar Jan 22 '25 02:01 github-actions[bot]

Issue isn't solved yet

CAFxX avatar Jan 22 '25 03:01 CAFxX

I no longer work on this project. This kind of issue should be able to be solved via the new dataflow flex templates. It seems that I cannot unassign myself from the issue though. The current owner of the repo can triage.

Thanks.

azurezyq avatar Jan 22 '25 06:01 azurezyq

I agree with the new dataflow flex template if this is still needed.

liferoad avatar Jun 01 '25 18:06 liferoad