kedro-plugins icon indicating copy to clipboard operation
kedro-plugins copied to clipboard

Kedro-dataset release process

Open noklam opened this issue 2 years ago • 9 comments
trafficstars

Introduction

How are we going to release when certain libraries are not compatible? i.e. if tensorflow has no support for Python3.11, how do we handle this in our CI?

Background

Since the separation of kedro-datasets, it's now possible to upgrade kedro / kedro-datasets separately. Prior to this, kedro was always compatible will all datasets so we didn't have this challenge before.

Problem

  • How do we make our CI works and allow certain DataSets to skip CI?
  • Should the user always install the latest version?
    • For example, let's say version 1.0.10 support Python3.10 for Tensorflow and 1.0.11 add support for Python3.11. In theory, if users are using Python<3.11, it would not be a problem if they install 1.0.11.

Possible Solution

  • We could create some kind of tag/decorators to skip tests in "file" or "module" level to skip tests. It may get a little bit messy w

noklam avatar Mar 16 '23 19:03 noklam

Pre-requisite of

  • #2048

noklam avatar Mar 17 '23 13:03 noklam

I didn't think this through 100 % but maybe the whole point of having a meta-package like this is that we test it cohesively for certain versions of Python and dependencies? Otherwise maybe it would be better to just have each dataset on a separate package to avoid this conundrum, at the cost of increasing the overhead a bit.

astrojuanlu avatar Mar 20 '23 15:03 astrojuanlu

kedro-org/kedro#2417 Related

noklam avatar Mar 22 '23 15:03 noklam

These days I'm working more with kedro-datasets and I'm feeling the pain of installing all the dependencies myself, so I understand where this frustration comes from.

But if we're packaging it as a single project in PyPI... I stand by my point, we should validate it as a whole.

astrojuanlu avatar Mar 22 '23 15:03 astrojuanlu

@astrojuanlu How would we validate it as a whole? In this case, if tensorflow never release a Python3.11 version do we just don't release newer versions or do we bump the semantic version every time some libraries drop support and be okay with the rest?

This may be off-topic.

I was checking out examples of pyproject.toml yesterday and finding inspiration from pandas. They did not expose this ' optional-dependencyto PyPI (at least I couldn't figure out a way to dopip install pandas[hdf]` or equivalent). https://pandas.pydata.org/docs/getting_started/install.html

Instead, they take a more passive path which just provide a list of optional dependency which you will just see the error and do the install yourself. https://github.com/pandas-dev/pandas/blob/5c155883fdc5059ee7e21b20604a021d3aa92b01/pyproject.toml#L58

noklam avatar Mar 22 '23 15:03 noklam

How would we validate it as a whole? In this case, if tensorflow never release a Python3.11 version do we just don't release newer versions or do we bump the semantic version every time some libraries drop support and be okay with the rest?

This may be off-topic.

I share your concerns, it's just hard. But maybe it's an excuse to consider unbundling kedro-datasets into different packages.

Instead, they take a more passive path which just provide a list of optional dependency which you will just see the error and do the install yourself.

Yeah I think pandas does a great job at offering these optional dependencies as "progressive enhancement". But kedro-datasets is a collection of disjoint things, so I'm not sure they can be compared on equal grounds.

I know I'm not being very helpful, sorry about that 😬 My point is that I think we're trying to solve a problem that is just very hard, potentially introducing lots of complexity in our tests and CI and import mechanisms (see also kedro-org/kedro#138).

astrojuanlu avatar Mar 22 '23 15:03 astrojuanlu

5 months in, do you think there are any outstanding pain points we should address?

astrojuanlu avatar Aug 22 '23 13:08 astrojuanlu

@astrojuanlu

  1. We have SnowparkTableDataSet which support Python 3.8 only. (thus build docs only works on Python 3.8, if one day A dataset works on Python 3.8 only and B Dataset works on Python 3.9 only then we have a new problem)
  2. There are no test written for it, so the problem is not surfaced.
  3. We may eventually have tests need to be run conditionally with specific Python Version

I wouldn't close the issue, but it seems that it is not causing any problem so we may just leave it for now.

noklam avatar Aug 22 '23 13:08 noklam

We have SnowparkTableDataSet which support Python 3.8 only. (thus build docs only works on Python 3.8, if one day A dataset works on Python 3.8 only and B Dataset works on Python 3.9 only then we have a new problem)

Yeah I think special cases like these merit having a separate package. Otherwise kedro-datasets will soon become a dumpster fire.

astrojuanlu avatar Aug 22 '23 13:08 astrojuanlu