DataflowTemplates icon indicating copy to clipboard operation
DataflowTemplates copied to clipboard

Terraform template to launch a sharded bulk migration

Open Deep1998 opened this issue 1 year ago • 1 comments

This template is used to orchestrate bulk migrations jobs for a sharded setup. It accepts a shardedConfig of all the shards, and batches multiple physical shards into a job based on a batch_size parameter.

Deep1998 avatar Aug 28 '24 08:08 Deep1998

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 50.11%. Comparing base (bf05990) to head (39ef33f). Report is 32 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1819      +/-   ##
============================================
+ Coverage     42.99%   50.11%   +7.11%     
+ Complexity     3418     1290    -2128     
============================================
  Files           819      363     -456     
  Lines         47910    19751   -28159     
  Branches       5152     1980    -3172     
============================================
- Hits          20600     9898   -10702     
+ Misses        25639     9190   -16449     
+ Partials       1671      663    -1008     
Components Coverage Δ
spanner-templates 63.57% <ø> (-0.14%) :arrow_down:
spanner-import-export ∅ <ø> (∅)
spanner-live-forward-migration 75.11% <ø> (-0.06%) :arrow_down:
spanner-live-reverse-replication 68.93% <ø> (-2.69%) :arrow_down:
spanner-bulk-migration 84.19% <ø> (+0.34%) :arrow_up:

see 508 files with indirect coverage changes

codecov[bot] avatar Aug 28 '24 08:08 codecov[bot]

You should still provide a sample for the simplest use-case of just launching a single database/instance bulk migration. Deleting the multiple-jobs one makes sense, but please add a simpler, newer one which -

  1. Configures any required permissions
  2. Runs the Dataflow job.

manitgupta avatar Sep 03 '24 09:09 manitgupta

You should still provide a sample for the simplest use-case of just launching a single database/instance bulk migration. Deleting the multiple-jobs one makes sense, but please add a simpler, newer one which -

  1. Configures any required permissions
  2. Runs the Dataflow job.

I was thinking of doing this but launching a single job via terraform seems like an overkill as launching bulk is pretty straightforward. Customers would likely use this only for sharded use cases.

Deep1998 avatar Sep 03 '24 09:09 Deep1998

I was thinking of doing this but launching a single job via terraform seems like an overkill as launching bulk is pretty straightforward. Customers would likely use this only for sharded use cases.

I don't think its overkill. On the contrary, TF gives a bunch of advantages around state management that most other ways don't give. Moreover, during a migration if the user is already using TF in other places (for e.g live migration), they will likely use it here as well.

I think adding a small rudimentary example will be quick and checks our boxes in terms of providing examples. Should we go ahead and add it?

manitgupta avatar Sep 03 '24 09:09 manitgupta