terraform-provider-iterative icon indicating copy to clipboard operation
terraform-provider-iterative copied to clipboard

📚 Epic: Cloud data sync for not-dvc scenarios

Open dmpetrov opened this issue 2 years ago • 5 comments

Goal: recover deep learning jobs, minimize data sync for reusable machines (#209)

Cloud data sync - all data syncs through a cloud directly (S3, etc). This scenario does not include direct data sync - from the user's laptop to a cloud instance.

First, we need research on the best practice of data syncing. Open questions:

  1. Do we need a file watcher?
  2. DVC or rclone or ...?

dmpetrov avatar Sep 13 '21 07:09 dmpetrov

🔔 @dmpetrov & @iterative/cml, is this relevant after the iterative_task resource?

0x2b3bfa0 avatar Apr 24 '22 12:04 0x2b3bfa0

yes it is.

  • efficiency (try to sync diffs rather than entire files)
  • workspace awareness (skip uploading .git/, .terraform* etc.)

casperdcl avatar Apr 25 '22 08:04 casperdcl

efficiency (try to sync diffs rather than entire files)

Same company, two DVC implementations? 🤔

0x2b3bfa0 avatar Apr 25 '22 09:04 0x2b3bfa0

workspace awareness (skip uploading [...])

  • Skipping upload of .terraform sounds like an opinionated–for–good decision
  • Skipping upload of .git is rather controversial, though

0x2b3bfa0 avatar Apr 25 '22 09:04 0x2b3bfa0

to me the documentation that describes the copying behavior of task's storage reads: with workdir = '.' everything gets copied to the instance, but only files written into ./somedir given output = 'somedir' get copied back?

dacbd avatar Apr 26 '22 01:04 dacbd