terraform-provider-iterative
terraform-provider-iterative copied to clipboard
📚 Epic: Cloud data sync for not-dvc scenarios
Goal: recover deep learning jobs, minimize data sync for reusable machines (#209)
Cloud data sync - all data syncs through a cloud directly (S3, etc). This scenario does not include direct data sync - from the user's laptop to a cloud instance.
First, we need research on the best practice of data syncing. Open questions:
- Do we need a file watcher?
- DVC or rclone or ...?
🔔 @dmpetrov & @iterative/cml, is this relevant after the iterative_task
resource?
yes it is.
- efficiency (try to sync diffs rather than entire files)
- workspace awareness (skip uploading
.git/
,.terraform*
etc.)
efficiency (try to sync diffs rather than entire files)
Same company, two DVC implementations? 🤔
workspace awareness (skip uploading [...])
- Skipping upload of
.terraform
sounds like an opinionated–for–good decision - Skipping upload of
.git
is rather controversial, though
to me the documentation that describes the copying behavior of task's storage
reads:
with workdir = '.'
everything gets copied to the instance, but only files written into ./somedir
given output = 'somedir'
get copied back?