kedro icon indicating copy to clipboard operation
kedro copied to clipboard

[KED-2821] Convert all imports according to packaging specs in pyproject.toml

Open lorenabalan opened this issue 2 years ago • 3 comments

At kedro pipeline package time, check if there are any packaging specs in pyproject.toml (keys under [tool.kedro.pipeline.package]). If so, refactor all imports according to the aliasing made explicit in the manifest.

[tool.kedro.pipeline.package]
"pipelines.alpha" = {alias = beta}
"pipelines.alpha.utils" = {alias = sigma}
"other_package" = {alias = utilities, destination = "dist/libraries"}

In the example above, utils should be renamed to sigma (and any references inside it to alpha should refer to beta), alpha to beta (and any references inside it to utils should refer to sigma), etc.

❓ How to work out the ideal order?

❓ Would working on a shadow-copy of the whole (or part of the) project help here?

❓ Checkout how nested packages would work, e.g. the alpha and alpha.utils. Consider doing top-level packages first, and then the nested ones.

For more context see comments on the original issue, copied over below.

lorenabalan avatar Mar 01 '22 12:03 lorenabalan

We talked about the experience of packaging/pulling modular pipelines, and how we can actually think of it as micro-packaging any part of the Python code in a Kedro project, at arbitrary depth in the nested folder structure. We can think of pyproject.toml as an additional requirements.txt, which deals with "internal" dependencies.

Worked example detailed in https://github.com/kedro-org/kedro/issues/1304#issuecomment-1055387856.

Main ideas & proposed actions:

  • Switch from .whl to sdist (built distribution vs source distribution, which is what we actually need)

Our wheels currently look like two packages (one for tests, one for src), and because of that it wouldn't be guaranteed to work with any other Python package. The less customisation we have the better. sdist seems like the appropriate standard for this (sharing source code), and tests can be specified in sdist's metadata, rather than another package. Users can generate a wheel file from sdist, which makes it more flexible if they want to share wheels or source code with the client.

  • Allow alias to be a Python module path ↔️ can package any subsection of the project
  • Make packaging work with any path relative to src/<package_name>, i.e. not just pipelines. ↔️ can package utilities
  • Aliasing improvements
    • Phase I: Look at import paths: if self-contained to that module, package; else, raise error (upstream importing is not allowed).
    • Phase II (on top of phase I): Look at import paths: if in pyproject.toml, rework imports to new location/name; else, raise error.

Out of scope: When a path is added to the PYTHONPATH and the imports look like 3rd party, when they're actually from the local project, somewhere upstream. We treat it as third party and that's it.

Open question: ❓ How to declare the explicit dependency between micro-packages ❓ (also how do we know if the second micro-package should come as source code or directly as packaged asset?)

lorenabalan avatar Mar 01 '22 12:03 lorenabalan

Translating whiteboard exercise

In a tree like:

<package_name>
|__ pipelines
     |__ clean_pipeline
          |__ nullify
               |__ pipeline.py
               |__ nodes.py
|__ utils
     |__ features
     |__ cleaning
           |__ impute
                |__ main.py
                |__ sth.py
           |__ nullify
                |__ func.py
                |__ sth.py

With a pyproject.toml like:

[tool.kedro.pipeline.package]
"utils.cleaning.impute" = {alias = "banking_impute"}
"utils.cleaning.nullify" = {alias = "banking_nullify"}

[tool.kedro.pipeline.pull]
"banking_impute" = {alias = "pkg.utils.cleaning"}

If we try to package nullify from clean_pipeline (i.e. pipelines.clean_pipeline.nullify), we look at the imports inside:

from pkg.pipelines.clean_pipeline.nulify import nodes  # can package, no problem
from pkg.utils.cleaning.impute import main  # turn into `from banking_impute import main`, then package
from pkg.utils.features.windowing import daily  # can't package, needs to be at least in `pyproject.toml`

As mentioned before, upstream imports are not allowed. In practice: if we try to package utils.cleaning.impute, the following import would make the packaging workflow error out:

from pkg.utils import another_utility

lorenabalan avatar Mar 01 '22 12:03 lorenabalan

Open question: ❓ How to declare the explicit dependency between micro-packages ❓ (also how do we know if the second micro-package should come as source code or directly as packaged asset?)

This question is being addressed separately in #1478

AhdraMeraliQB avatar May 17 '22 10:05 AhdraMeraliQB

This issue was discussed in Technical Design on 14/12

  • In https://github.com/kedro-org/kedro/pull/2119 it was discovered that the feature to add micropkg specs in pyproject.toml hasn't worked since March 2022. Nobody has reported this.
  • The team agreed that before implementing more advanced functionality to the micropkg specs in pyproject.toml, we need to investigate if users are using this functionality. https://github.com/kedro-org/kedro/issues/2123
  • It would also be worth talking to the Alloy team and hearing how they do packaging and if they find the micro-packaging functionality in Kedro useful or not.

merelcht avatar Dec 14 '22 14:12 merelcht

Is this whole import mangling worth the effort? Maybe we could tell users "refactor common code/functions that are not in the pipeline itself as a separate package, then declare it as a dependency".

(Simplistic question meant to spark discussion and, in line with what @merelcht said right above, talk to our users about the micropackaging workflow).

astrojuanlu avatar May 08 '23 13:05 astrojuanlu

I think we need to decide what to do with the micropackaging feature as a whole before introducing any more features like the suggested one here.

merelcht avatar Dec 18 '23 15:12 merelcht

Closing for now, we can always revisit in the future

astrojuanlu avatar Dec 18 '23 15:12 astrojuanlu