terraform-provider-iterative icon indicating copy to clipboard operation
terraform-provider-iterative copied to clipboard

PR that replaces `machine` with `task` within DVC machine

Open DavidGOrtega opened this issue 2 years ago • 7 comments

PR that alters tpi behaviour replacing machine with task

The basic idea would be starting a dummy service and they would be controlling task the same way they do with machine

DavidGOrtega avatar Dec 07 '21 18:12 DavidGOrtega

potentially could transfer this to tpi?

casperdcl avatar Dec 07 '21 19:12 casperdcl

"DVC machine"?

dacbd avatar Dec 07 '21 19:12 dacbd

🙊

0x2b3bfa0 avatar Dec 07 '21 23:12 0x2b3bfa0

🙊

👀🎁🎄

dacbd avatar Dec 08 '21 00:12 dacbd

This probably belongs in the tpi python package. But also, IMO the python package should reflect all of the resource types supported in the actual provider (so both iterative_machine and iterative_task resources, but maybe not the CML specific types).

pmrowla avatar Dec 08 '21 01:12 pmrowla

I see that iterative-machine is now undocumented in the readme (and the formal provider docs), but having a standardized (cloud-agnostic) way of specifying a remote machine with consistent mappings for cpu/gpu resources still seems like it would be a useful general-purpose (non-DVC/CML related) feature for us to provide.

pmrowla avatar Dec 08 '21 01:12 pmrowla

To summarise DVC-required features of TPI task:

  • [x] serverless (no SaaS deps)
  • [x] multi-cloud provisioning support
  • [x] sync via folder
  • [ ] sync directly (e.g. via SSH and no S3 bucket, no spot instance recovery of data)
  • [x] one-off user-specified script/experiment
  • [x] user-specified setup deps (manually in initial script)
  • [x] download all logs (terraform refresh && terraform show)
  • [x] user can go offline

So there's "just" one major outstanding feature required for this integration :)

casperdcl avatar Dec 17 '21 19:12 casperdcl