[FLINK-38231][python] Standardise use of uv for PyFlink building, testing & linting
What is the purpose of the change
With FLINK-37775 and FLINK-36900 we have started to use uv for managing Python testing environments and installing linting tools, as well as defining test/lint/typecheck dependencies in pyproject.toml.
However, the CI/CD scripts and developer documentation is in a bit of an inbetween state. We use uv for creating Python virtual environments for tool installation, but uv supports these natively with uv run disregarding the need for a custom made virtual environment, for example.
This PR does the following:
- Uses
uvin our CI/CD scripts and developer documentation where possible. When it comes to lint stages such as flake8 and mypy, for example, the dependencies for those checks are managed byuvusinguv run. This means the install steps for those tools inlint-python.shcan be removed, as this is managed by uv. - Used the
tox-uvextension totoxso thattoxusesuvto create the correct python environment for testing against various python versions, rather than relying on premade virtual environments. This also means the oldinstall-command.shscript is no longer needed. - Replaced instances where the
apache-flinkandapache-flink-librariespackages were being built viapython setup.pyto instead useuv build, taking advantage of build isolation and automatic build dependency management. - Migrated static package metadata for
apache-flinkandapache-flink-librariesinto their ownpyproject.tomlfiles, so they are viewed as concrete projects byuv. - Added
./apache-flink-librariesas a uv source so that, during development, theapache-flink-librariespackage is automatically built (for example, when doinguv pip install -e .in theflink-pythonproject). This sidesteps the need for building and installing theapache-flink-librariesdependency manually from source when doing local development. - Changed the
build-wheels.shscript to build the pyflink wheels usinguv build --python <python-version>. This, coupled with the tox changes, means that thepy_envstep oflint-python.sh(where we create venvs for supported python versions) can be removed.
Brief change log
- Migrated the
lint-python.shscript to useuv runfor running lint, testing, typechecking and docs building steps. - Added
tox-uvand bumped thetoxdependency so that it can create the virtualenvs that it needs to run tests as needed withuv. - Updated building and testing scripts to use
uv build,uv runanduv pipwhere possible. - Added section to developer docs about building the PyFlink project using
uv.
Verifying this change
This change is already covered by existing tests, such as PyFlink unit tests, end-to-end tests and running the build-wheels.sh script.
Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): (no)
- The public API, i.e., is any changed class annotated with
@Public(Evolving): (no) - The serializers: (no)
- The runtime per-record code paths (performance sensitive): (no)
- Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
- The S3 file system connector: (no)
Documentation
- Does this pull request introduce a new feature? (no)
- If yes, how is the feature documented? (not applicable)
CI report:
- aa2180bcaeddf15522cbf4e45ab9bef7227cd410 Azure: FAILURE
Bot commands
The @flinkbot bot supports the following commands:@flinkbot run azurere-run the last Azure build
@dianfu @HuangXingBo While i'm digging around the build/releasing stuff - do you know why we build/publish a wheel and an sdist for apache-flink, but only an sdist for apache-flink-libraries?
@autophagy
-
The purpose of
apache-flink-libraries: The purpose ofapache-flink-librariesis to split the JAR files(which are huge) into a separate project. Otherwise, when we release a new version of PyFlink, the total size of the artifacts is very large (each artifacts contains the JAR files), about 2 GB or so since there are multiple artifacts for each Python versions supported and for each platform supported. PyPI has a limitation for each project on the size it could use. We have contacted the PyPI to increase the project size multiple times before introducingapache-flink-libraries. -
Why there is only sdist for
apache-flink-libraries: Since it only contains JAR files, sdist is enough.wheelpackages are usually for cython files which are platform-dependent. Besides, the purpose of this project is to reduce the artifact size of each release, if we still publish wheel packages, it will still take too much size.
@dianfu Ah, makes sense! Thank you for the context 🙂