FR: use inline metadata for dependency management.
Hi,
Since PEP723 we can now specify in the docstring at the top of the python file metadata that include package requirements. This way we can use a single python file script that fully describe a pipeline right?
Example:
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "requests<3",
# "rich",
# ]
# ///
import requests
from rich.pretty import pprint
I'm guessing you've heard of this so what are the reasons I'm missing?
Thanks!
Are you talking about package and dependency management for a new pipeline? There's an implicit conflict from things being added at runtime (how these pipelines are "discovered" via directory) versus at build (when they are built as part of the repo itself).
Citing from the same PEP document
The metadata format is designed to be similar to the layout of data in the pyproject.toml file of a Python project directory, to provide a familiar experience for users who have experience writing Python projects. By using a similar format, we avoid unnecessary inconsistency between packaging tools, a common frustration expressed by users in the recent packaging survey.
This points to the build and packaging woes of developers. While it's not something to be brushed aside, it's non-trivial to scan and trigger the right package manager operations (typically relegated to a docker build process which in turn calls a pip or poetry process) -- please pardon the over-instruction if you're aware of this already.
So for now it seems we're at an impasse for automation of the process. I'm in the same boat with adding other complexity in sub operations, but am choosing a different solution. It's a bit of a Rube Goldberg machine when you look at it at first, but (1) I build a number of modular docker images that (2) use FastAPI to expose a single "process" function, that (3) can be called a-la microservices to execute the atomic operation within. It's a system we're just getting started with, but maybe there's something that can be leveraged for your project that may have some heavy dependency requirements.
The good news is that there is a lot of work that's already gone into pipeline stand-up and making it an atomic API / microservice of its own, but there's still work to get to these tightly integrated build and automation tasks that are mentioned here.
Update: Looks like start.sh attempts to do this a little now and there there's a pending pull request to do this as well. Maybe that'll get you where needed?