fondant icon indicating copy to clipboard operation
fondant copied to clipboard

Support direct runner (Colab support)

Open RobbeSneyders opened this issue 1 year ago • 4 comments

Fondant pipelines currently cannot be executed in Google Colab, which would be a great way for users to try out Fondant. This is due to the limitation of Google Colab to run docker.

We should investigate the best way to support this. Two options are:

  • Creating a VenvRunner which executes each component in a virtual environment. For local custom components, this should be doable, but this currently won't work for reusable components. This would require changes to how we package and share reusable components, since currently only the Docker container and component spec are shared, while we would need the original source files.
  • Using udocker as a docker replacement. It's not completely a drop-in replacement though, so we should validate how feasible this is. I did a quick PoC and was able to execute a Fondant container using udocker directly. More changes would be needed to let Fondant use udocker.

RobbeSneyders avatar Jan 02 '24 14:01 RobbeSneyders

Find a gist of my PoC here: https://gist.github.com/RobbeSneyders/e0ffd2341d3a153a0ccf728266525aa0

RobbeSneyders avatar Jan 02 '24 14:01 RobbeSneyders

What is the exact limitation of colab that makes us unable to use docker ? Is it sudo rights ?

GeorgesLorre avatar Jan 03 '24 12:01 GeorgesLorre

I believe the issue is that colab is already running in a docker container itself. Docker in docker isn't really possible unless the host is configured in a specific way as far as I know. And even then, you would be connecting to the host docker, which is not something Google wants to enable (understandably).

RobbeSneyders avatar Jan 03 '24 12:01 RobbeSneyders

I'm updating this ticket to better reflect the scope:

We want to support direct execution of components via a direct runner:

  • this will provide colab support for local running of pipelines (since docker is unsupported)
  • this will power eager execution with a very fast way of running components

This does mean that this is a new type of runner and there are some things to be solved:

  • installation of dependencies (and some kind of cache?)
  • support for reusable components
  • Needs to work in Colab since it is a prime way for users to start experimenting with Fondant
  • ...

Some ideas on how to proceed:

  • virtual environment runner where every component is ran in a new venv
  • udocker (see above)
  • leverage kfp's subprocessrunner
  • ...

GeorgesLorre avatar Feb 14 '24 14:02 GeorgesLorre