[Core feature] Specify task Python dependencies
Motivation: Why do you think this is important?
Currently adding new Python dependencies to a task requires users to create a new docker image with these dependencies, which takes some time and effort. Alternatively dependencies can currently also be installed at runtime with pip, but this can be unreliable as the environment isn't restarted after installation, and this is more verbose than it could be too.
There should be a way to specify and install the required Python packages for tasks before running them.
Goal: What should the final outcome look like, ideally?
AirFlow allows specifying requirements directly on the task decorator with task.virtualenv (see here for more details). These will then be installed before the task runs.
Flyte's implementation could follow the same idea by adding a requirements parameter to its task decorator:
@task(requirements=["colorama==0.4.0"], ...)
def my_task(...):
...
As far as a I understand, AirFlow creates a new virtual environment to install these dependencies into. In Flyte it is probably okay installing them into the active environment directly?
Describe alternatives you've considered
Some more ideas that were shared on Slack:
- Allow specifying an init script that runs before the task. This init script could run
pip install -r requirements.txt, and even more things beyond solving just this issue (eg. installing dependencies withapt). - Specify a path to a
requirements.txtfile instead of listing all dependencies in the task decorator
Propose: Link/Inline OR Additional context
Thread about this on Slack: https://flyte-org.slack.com/archives/CNMKCU6FR/p1653428501588649?thread_ts=1653428501.588649&cid=CNMKCU6FR
Are you sure this issue hasn't been raised already?
- [X] Yes
Have you read the Code of Conduct?
- [X] Yes
Thank you for opening your first issue here! 🛠
I think imagespec handles this in a more elegant and non error prone way? Wdyt @RobinKa
Yep ImageSpec seems good, can close imo