kedro-plugins icon indicating copy to clipboard operation
kedro-plugins copied to clipboard

build(datasets): Make sure experimental datasets are packaged as part of `kedro-datasets`

Open merelcht opened this issue 10 months ago • 2 comments

Description

Closes #653

Development notes

Possible solutions:

Solution 1

  • https://github.com/kedro-org/kedro-plugins/pull/654/commits/a17cb8a64412506745f0c39043e9be010b25c743
  • To use an experimental dataset users can just do from experimental.<...> import <...>

✅ Easy to implement packaging ✅ Position of the experimental directory makes it very clear that the datasets inside it aren't part of the core datasets ❌ experimental is a very generic name and it wouldn't be clear from the import that this is Kedro related

Solution 2

  • https://github.com/kedro-org/kedro-plugins/pull/654/commits/aab299b14b8e440fe97f153944ada2925a478304
  • Rename the experimental directory to kedro_datasets_experimental
  • The import will then become from kedro_datasets_experimental.<...> import <...>

✅ Easy to implement packaging ✅ Position of the kedro_datasets_experimental directory makes it very clear that the datasets inside it aren't part of the core datasets ✅ Kedro is in the name of the import

Solution 3

  • Move the experimental directory inside the kedro_datasets directory
  • The import will then become from kedro_datasets.experimental.<...> import <...>

✅ Easy to implement ✅ Kedro is in the name of the import ❌ Position of the experimental directory makes it less clear that this is not just a core dataset, but the directory that contains the experimental datasets

Solution 4

  • Turn kedro_datasets and experimental into namespaced packages: https://packaging.python.org/en/latest/guides/packaging-namespace-packages/

✅ Very clear separation ❌ Harder to implement and also requires more setup for releasing both packages.

Checklist

  • [ ] Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • [ ] Updated the documentation to reflect the code changes
  • [ ] Added a description of this change in the relevant RELEASE.md file
  • [ ] Added tests to cover my changes

merelcht avatar Apr 16 '24 13:04 merelcht

Yes that's what I wrote in the PR description. I didn't want to put the experimental directory inside the kedro_datasets one because then it just looks like it's one of the many types of datasets. But I see the point that it wouldn't be immediately clear that this is Kedro related import if it comes straight from experimental. Is it possible to get the import to point to kedro_datasets.experimental without actually moving the directory?

merelcht avatar Apr 18 '24 08:04 merelcht

Is it possible to get the import to point to kedro_datasets.experimental without actually moving the directory?

I'm not sure. Worth investigating

astrojuanlu avatar Apr 18 '24 08:04 astrojuanlu

I've updated the implementation and description.

merelcht avatar May 13 '24 15:05 merelcht

This solution makes sense to me, just one question - do you need to update the pyproject.toml too with this approach, like in https://github.com/kedro-org/kedro-plugins/commit/a17cb8a64412506745f0c39043e9be010b25c743?

ankatiyar avatar May 14 '24 13:05 ankatiyar

This solution makes sense to me, just one question - do you need to update the pyproject.toml too with this approach, like in a17cb8a?

I guess since the final name is kedro_datasets_experimental (initially it was experimental) it falls into the existing pattern:

[tool.setuptools.packages.find]
include = ["kedro_datasets*"]

ElenaKhaustova avatar May 14 '24 14:05 ElenaKhaustova

Love this solution! ⭐

astrojuanlu avatar May 14 '24 17:05 astrojuanlu