kedro-plugins
kedro-plugins copied to clipboard
build(datasets): Make sure experimental datasets are packaged as part of `kedro-datasets`
Description
Closes #653
Development notes
Possible solutions:
Solution 1
- https://github.com/kedro-org/kedro-plugins/pull/654/commits/a17cb8a64412506745f0c39043e9be010b25c743
- To use an experimental dataset users can just do
from experimental.<...> import <...>
✅ Easy to implement packaging
✅ Position of the experimental
directory makes it very clear that the datasets inside it aren't part of the core datasets
❌ experimental
is a very generic name and it wouldn't be clear from the import that this is Kedro related
Solution 2
- https://github.com/kedro-org/kedro-plugins/pull/654/commits/aab299b14b8e440fe97f153944ada2925a478304
- Rename the
experimental
directory tokedro_datasets_experimental
- The import will then become
from kedro_datasets_experimental.<...> import <...>
✅ Easy to implement packaging
✅ Position of the kedro_datasets_experimental
directory makes it very clear that the datasets inside it aren't part of the core datasets
✅ Kedro is in the name of the import
Solution 3
- Move the
experimental
directory inside thekedro_datasets
directory - The import will then become
from kedro_datasets.experimental.<...> import <...>
✅ Easy to implement
✅ Kedro is in the name of the import
❌ Position of the experimental
directory makes it less clear that this is not just a core dataset, but the directory that contains the experimental datasets
Solution 4
- Turn
kedro_datasets
andexperimental
into namespaced packages: https://packaging.python.org/en/latest/guides/packaging-namespace-packages/
✅ Very clear separation ❌ Harder to implement and also requires more setup for releasing both packages.
Checklist
- [ ] Opened this PR as a 'Draft Pull Request' if it is work-in-progress
- [ ] Updated the documentation to reflect the code changes
- [ ] Added a description of this change in the relevant
RELEASE.md
file - [ ] Added tests to cover my changes
Yes that's what I wrote in the PR description. I didn't want to put the experimental
directory inside the kedro_datasets
one because then it just looks like it's one of the many types of datasets. But I see the point that it wouldn't be immediately clear that this is Kedro related import if it comes straight from experimental
. Is it possible to get the import to point to kedro_datasets.experimental
without actually moving the directory?
Is it possible to get the import to point to kedro_datasets.experimental without actually moving the directory?
I'm not sure. Worth investigating
I've updated the implementation and description.
This solution makes sense to me, just one question - do you need to update the pyproject.toml
too with this approach, like in https://github.com/kedro-org/kedro-plugins/commit/a17cb8a64412506745f0c39043e9be010b25c743?
This solution makes sense to me, just one question - do you need to update the
pyproject.toml
too with this approach, like in a17cb8a?
I guess since the final name is kedro_datasets_experimental
(initially it was experimental
) it falls into the existing pattern:
[tool.setuptools.packages.find]
include = ["kedro_datasets*"]
Love this solution! ⭐