kedro
kedro copied to clipboard
kedro micropkg pull from pypi is not working in 0.18.2
Description
A micro-packaged pipeline which is uploaded to a PyPi repository can not be successfully pulled.
We have been using kedro very actively, so thanks for your support with this issue!
Context
I was trying to push and pull a micro package via a private PyPi repository to share and reuse pipelines.
Steps to Reproduce
- kedro pipeline create <pipeline_name>
- kedro micropkg package pipeline.<pipeline_name>
- twine upload -dist/*
- kedro micropkg pull <pipeline_name>
Expected Result
The micro-package should be downloaded via pip and successfully pulled via kedro micropkg pull
Actual Result
The micro-package is downloaded via pip but there seems to be an isse here: https://github.com/kedro-org/kedro/blob/4843fadfc887c410bb03ba3e54f6737019eeef3b/kedro/framework/cli/micropkg.py#L147
Path(package_path).name
is expected to be a tar.gz file, but in my case it is actually the name of the pip package that was downloaded. As a result egg_info_file
will be an empty list.
This is the error:
kedro.framework.cli.utils.KedroCliError: More than 1 or no egg-info files found from <pipeline_name>. There has to be exactly one egg-info directory.
I can bypass the issue by running this:
-
python -m pip download --no-deps --dest <destination> <pipeline_name>
- Copying the tar.gz path of the downloaded package
<micro-package-path>
-
kedro micropkg pull <micro-package-path>
Your Environment
- Kedro version used (
pip show kedro
orkedro -V
): 0.18.2 - Python version used (
python -V
): 3.9.12 - Operating system and version: Ubuntu 20.04.3 and MacOS 12.5
I think I know (part of) the problem: .rstrip(".tar.gz")
does not remove the suffix, but strips the ensemble of letters '.', 'a', 'g', 'r', 't', 'z'
, which in case of our package removes a letter.
Also, the version seems to be missing. I put a breakpoint, and at this point there is a directory with package_name-0.1
and not package-name
. Note also that there is an underscore _
in the real directory, but this line expects an hyphen -
(sorry for the noise, I put what I found here in case I do not have time to make a PR).
- In the case of
.tar.gz
file, thepackage_name
is correctly the name of the package with underscores and the version (which is what stops the.rstrip
from stripping package name's letters) which is the name of the folder untarred (not sure this is a word) - In the case of a
pip
package, thepackage_name
is the name of the package with dashes-
and no version.
They cannot be treated the same way here.
Maybe use list(temp_dir_path.rglob("*.egg-info"))
which may work in both cases?
👍 thanks