clearml-agent icon indicating copy to clipboard operation
clearml-agent copied to clipboard

Consider clarifying, is this an alternative for Kubeflow?

Open austinkeller opened this issue 4 years ago • 1 comments

As a dummy who is evaluating different options for ML Ops, I don't have a full picture of how Kubeflow works. Does trains-agent integrate with Kubeflow? Or is it a more R&D-friendly replacement?

austinkeller avatar Oct 16 '20 21:10 austinkeller

Hi @austinkeller

Or is it a more R&D-friendly replacement?

Kind of, but also integrates with Kubeflow :)

Specifically, Kubeflow assumes all steps are self contained containers, and that data can be volume mounted etc. In this aspect trains-agent solves the containerization problem and adds logging into the process.

To understand how trains work, usually the dev steps are:

  1. Write code on "local" machine. Using trains all the code/environment/arguments are logged (including a few other stuff, but less relevant to our case)
  2. Clone experiment in UI (or from code / automation)
  3. Put code into execution queue (the trains scheduler,it also includes priorities etc, with UI as part of the system UI, see trains-server)
  4. trains-agent running on remote machine in daemon setup, pulls the experiment from the execution queue, sets the environment accordingly and launch / monitor the process

Back to KubeFlow, since creating the experiment is done automatically (see step (1) trains records the environment and creates the experiment in runtime), trains-agent can build a docker container for the experiment to later be used by Kubeflow. This makes the packaging a lot easier (see trains-agent build --docker) . You can actually make it even lighter, and use trains-agent to setup and launch an experiment without packaging the experiment, but by using a base container and letting trains-agent setup everything inside the container (see trains-agent execute).

Does that remove a bit of the mystery ? What exactly is your use case ? (Is it more development oriented, or productization stage ?)

bmartinn avatar Oct 16 '20 22:10 bmartinn