terraform-provider-iterative
terraform-provider-iterative copied to clipboard
📚 Epic: Cloud data sync for not-dvc scenarios
Goal: recover deep learning jobs, minimize data sync for reusable machines (#209)
Cloud data sync - all data syncs through a cloud directly (S3, etc). This scenario does not include direct data sync - from the user's laptop to a cloud instance.
First, we need research on the best practice of data syncing. Open questions:
- Do we need a file watcher?
- DVC or rclone or ...?
🔔 @dmpetrov & @iterative/cml, is this relevant after the iterative_task resource?
yes it is.
- efficiency (try to sync diffs rather than entire files)
- workspace awareness (skip uploading
.git/,.terraform*etc.)
efficiency (try to sync diffs rather than entire files)
Same company, two DVC implementations? 🤔
workspace awareness (skip uploading [...])
- Skipping upload of
.terraformsounds like an opinionated–for–good decision - Skipping upload of
.gitis rather controversial, though
to me the documentation that describes the copying behavior of task's storage reads:
with workdir = '.' everything gets copied to the instance, but only files written into ./somedir given output = 'somedir' get copied back?