kedro icon indicating copy to clipboard operation
kedro copied to clipboard

kedro micropkg pull from pypi is not working in 0.18.2

Open daniel-ressi opened this issue 2 years ago • 2 comments

Description

A micro-packaged pipeline which is uploaded to a PyPi repository can not be successfully pulled.

We have been using kedro very actively, so thanks for your support with this issue!

Context

I was trying to push and pull a micro package via a private PyPi repository to share and reuse pipelines.

Steps to Reproduce

  1. kedro pipeline create <pipeline_name>
  2. kedro micropkg package pipeline.<pipeline_name>
  3. twine upload -dist/*
  4. kedro micropkg pull <pipeline_name>

Expected Result

The micro-package should be downloaded via pip and successfully pulled via kedro micropkg pull

Actual Result

The micro-package is downloaded via pip but there seems to be an isse here: https://github.com/kedro-org/kedro/blob/4843fadfc887c410bb03ba3e54f6737019eeef3b/kedro/framework/cli/micropkg.py#L147

Path(package_path).name is expected to be a tar.gz file, but in my case it is actually the name of the pip package that was downloaded. As a result egg_info_file will be an empty list.

This is the error:

kedro.framework.cli.utils.KedroCliError: More than 1 or no egg-info files found from <pipeline_name>. There has to be exactly one egg-info directory.

I can bypass the issue by running this:

  1. python -m pip download --no-deps --dest <destination> <pipeline_name>
  2. Copying the tar.gz path of the downloaded package <micro-package-path>
  3. kedro micropkg pull <micro-package-path>

Your Environment

  • Kedro version used (pip show kedro or kedro -V): 0.18.2
  • Python version used (python -V): 3.9.12
  • Operating system and version: Ubuntu 20.04.3 and MacOS 12.5

daniel-ressi avatar Aug 25 '22 14:08 daniel-ressi

I think I know (part of) the problem: .rstrip(".tar.gz") does not remove the suffix, but strips the ensemble of letters '.', 'a', 'g', 'r', 't', 'z', which in case of our package removes a letter.

Also, the version seems to be missing. I put a breakpoint, and at this point there is a directory with package_name-0.1 and not package-name. Note also that there is an underscore _ in the real directory, but this line expects an hyphen -

FlorianGD avatar Sep 12 '22 13:09 FlorianGD

(sorry for the noise, I put what I found here in case I do not have time to make a PR).

  • In the case of .tar.gz file, the package_name is correctly the name of the package with underscores and the version (which is what stops the .rstrip from stripping package name's letters) which is the name of the folder untarred (not sure this is a word)
  • In the case of a pip package, the package_name is the name of the package with dashes - and no version.

They cannot be treated the same way here.

Maybe use list(temp_dir_path.rglob("*.egg-info")) which may work in both cases?

FlorianGD avatar Sep 12 '22 14:09 FlorianGD

👍 thanks

daniel-ressi avatar Sep 30 '22 13:09 daniel-ressi