kedro
kedro copied to clipboard
Run `kedro new` without creating a new directory
Description & Context
Currently, when running kedro new
I have to specify a folder where my project is created. It does't make sense when I use venv
or Poetry with in-project venv, because I have to create a directory by myself anyway to init a venv and install Kedro there. As a result, I have to manually move the new project to the upper folder with mv ~/repos/project_name/project_name/* ~/repos/project_name/
.
Possible Implementation
Add flag to skip creating a folder.
Possible Alternatives
Ask about it during init.
I find it a bit awkward as well that you need to have kedro installed before you have created your project and likely have created your environment. Is it possible to give a cookie-cutter command alternative?
Potentially related to this chicken/egg situation of needing to have kedro installed (potentially globally) for project setup that may or may not be the version you are looking for in the project. Is it possible to achieve the same results of kedro install
with pip install -e .
Users may be onboarding to a new project in 0.16.x
and simply grab the latest version off of pypi (pip install kedro
) before running kedro install
.
Other than pip install -r requirements.txt
or reading requirements.txt for the specific version of kedro how should users know how to setup the project for local development?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Bump
Hey @jaklan thanks for the suggestion! It sounds like it might be related to https://github.com/cookiecutter/cookiecutter/issues/909 & https://github.com/cookiecutter/cookiecutter/pull/907 - I suggest you also express your interest there, maybe it'll speed it along for the next cookiecutter
release. I think it'd be better for them to address it rather than us building a custom solution, but I'll leave this open for a while to gauge interest on the feature request. 🤔
@WaylonWalker yes that's a valid point, though I'm not sure how it relates to the original question? It feels to me like it deserves its own separate discussion.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I was oblivious to this issue because I was using out-of-tree environments with conda/mamba, but a colleague that just tried to use Kedro with venv
experienced the same confusion. Here is some insight into in-tree (local) vs out-of-tree (global) environment workflows https://snarky.ca/classifying-python-virtual-environment-workflows/
Notice we already have some documentation about using venv
or Pipenv
instead of conda https://kedro.readthedocs.io/en/stable/faq/faq.html#can-i-create-a-virtual-environment-without-conda although I think we could make it a bit more clear (https://github.com/kedro-org/kedro/issues/2360).
I think it would be good to tag this as "Won't fix" (since it's unlikely that the upstream issue in cookiecutter is ever addressed). I'm going to go ahead and do it, otherwise folks feel free to reverse my decision cc @AntonyMilneQB
Another side effect of not being able to init a Kedro project in the current directory: users create weird structures with 2 READMEs for nothing, like https://github.com/pablovdcf/TFM_HADO_Cares
I know this issue was closed 2+ years ago but honestly it was basically the first pain point I encountered https://github.com/kedro-org/kedro/issues/2360 and it keeps coming up over and over again. I'm reopening so that we can reprioritize.
This is needed to have venv/virtualenv as first-class citizens in the Kedro installation instructions I think (otherwise the workflow is too weird).
An informal poll run by @/brettcannon showed that the majority of developers store virtual environments next to their project code. https://snarky.ca/classifying-python-virtual-environment-workflows/
I wonder what an equivalent poll result would look like for Kedro users. I suspect it would be much more biased towards global/central directory due to the prevalence of conda.
The point definitely still stands and it would be great if there were a better way to handle in-tree environments, but I would just be cautious assigning priority based on the above poll. It might be worth even doing the same sort of poll for kedro users (or potential kedro users I guess, because there's a selection bias as those who end up using kedro are more likely to be those who had a smooth experience using global environments). Maybe some such poll already exists for data scientists/similar.
@antonymilne there's no need to overthink that topic and refer to any polls - Kedro should allow people to generate a project in the current directory, period. In-project venvs are absolutely common in the Python ecosystem and they should be simply supported. Especially taking into consideration the fact you can solve the issue with one command moved to Kedro internals.
How would such function implemented? If i understand correctly cookiecutter still don't support this today, so it need to be implemented in Kedro.
Do we need to handle edge cases with existing files? Assuming an empty folder will be easy, would this be good enough?
@noklam I have already answered that in the initial issue (more than 2 years ago btw...) - you can mimic mv
inside Kedro.
Assuming an empty folder will be easy, would this be good enough?
Of course no, because the whole discussion is about generating a project in the current directory, because you have .venv
already created there (and probably other files, like the Poetry-specific ones, as well).
Do we need to handle edge cases with existing files?
You can simply display a proper warning when running kedro new
with e.g. --cwd
flag and wait for user confirmation then. Of course it can be more sophisticated and analyse if any files will be overwritten or not, but that's the easiest approach to start with.
Generally, I see there's also another issue about Poetry itself: https://github.com/kedro-org/kedro/issues/1722, so you need to implement a mechanism to move files anyway if you really want to support it. But there's also another approach - utilise a different, globally installed, CLI tool to initalise Kedro projects - e.g. kedro-starter
. This way you avoid chicken & egg problem.
Indeed, cookiecutter does not support this, nothing has changed since https://github.com/kedro-org/kedro/issues/681#issuecomment-823427629
Notice that cookiecutter already has "override/fail if exists" functionality, it's just that it always creates a subdirectory. We'll probably have to move files around.
We could start with a conservative stance, like "if any of the files I'm going to create already exists, fail". But this is easier said than done, because then kedro new
would need knowledge of the cookiecutter structure https://github.com/cookiecutter/cookiecutter/issues/1004
It's actually easier to blindly copy everything over, but this poses data loss risk.
copier handles this beautifully, but refactoring kedro away from cookiecutter would be painful:
I don't think this is impossible though. Once we agree this is needed, we'll have to carefully think the path of least resistance.
I never addressed @antonymilne 's point:
I wonder what an equivalent poll result would look like for Kedro users. I suspect it would be much more biased towards global/central directory due to the prevalence of conda.
And also because we are hiding our venv instructions behind a collapsible menu + the ergonomics are really weird:
so I wouldn't be surprised if the current users got sort of used to it. But that's the key trap we have to avoid, and you spelled it out already:
(or potential kedro users I guess, because there's a selection bias as those who end up using kedro are more likely to be those who had a smooth experience using global environments)
I can run an informal poll in Slack and see what people think.
On one hand, our over-reliance on conda creates some trouble for certain users. For example, here is a user that is struggling to install a compatible version of Kedro on Python 3.8 because of the pip and setuptools constraints https://linen-slack.kedro.org/t/16034230/hello-i-have-created-a-kedro-matlab-custom-dataset-which-i-w#20e4ffe4-e697-47fa-a722-d74a752b7bed
On the other hand, as much as I'd like this to happen, I'm reconsidering how impactful the change would be, because according to an informal survey I ran on Slack, the main annoyance seems to be that users have a "global" Kedro and a project-specific Kedro https://linen-slack.kedro.org/t/16040768/u05bdslpj72-finally-gave-the-steps-in-https-kedro-org-slack-#db2c34ed-43bf-4479-839c-5a4fb4154a10
So much so, that some users don't use kedro new
at all and rely on cookiecutter
directly ❗ https://linen-slack.kedro.org/t/16031681/hello-here-wave-skin-tone-3-i-come-with-a-thorny-question-to#e0833bfb-336a-4e7c-acef-948cc0146694 cc @inigohidalgo and this is tricky because I don't know what are the implications of the new add-ons flow on this workflow.
However, this points to a new interesting direction that might have even more impact: making kedro-new
a non-mandatory plugin. That's a large change that will need to be discussed in its own issue.
Issue about improving our installation documentation https://github.com/kedro-org/kedro/issues/3281
Notice that, if kedro new
could init the current directory, in principle users wouldn't need a global Kedro, but there would still be two installation steps:
- Create project directory
mkdir spaceflights && cd spaceflights
- Create venv
python -m venv .venv && source .venv/bin/activate
- Install Kedro
pip install kedro
-
kedro new --outdir .
(or whatever) - Install project dependencies
pip install -r requirements.txt
At least 2 users consider this second installation step confusing but there's not much else that can be done I believe https://linen-slack.kedro.org/t/16040768/u05bdslpj72-finally-gave-the-steps-in-https-kedro-org-slack-#8a7c5923-7fee-4d46-9229-1ca566b248e8
Is there a route where we make pipx
the recommended install path?
I don't think so. The problem here is that there are 2 CLI commands that have conflicting purposes:
-
kedro new
is unencumbered by project dependencies because the project doesn't exist by the time it's called, and also it doesn't change a lot so there's no point in upgrading. -
kedro run
(and anything related to the actual Kedro project) require the project dependencies to work (hence mandate thatkedro
is installed in the same environment) and also it should be up to date to benefit from new features, bug fixes, and performance improvements
I don't think there's a way to reconcile these two sets of requirements.
Slightly out there suggestion - If we had a web ui on the Website for the project add-ons workflow we could then spit out a folder that covers the kedro new
part
Let me just bump an idea mentioned above:
But there's also another approach - utilise a different, globally installed, CLI tool to initalise Kedro projects - e.g. kedro-starter. This way you avoid chicken & egg problem.
I believe it would solve many of the issues discussed in that thread. You could install kedro-starter
with e.g. pipx
then and it would be a quite similar experience to using cookiecutter
directly (in other words, you could treat kedro-starter
as a wrapper on top of cookiecutter
).
@jaklan that would overcome the main limitation I see from installing kedro in pipx, which is the conflicting versions between the local .venv kedro and the global pipx kedro. If the tool only takes up the kedro-starter
or kedro-new
namespace and leaves kedro
open to only be the local .venv version that would be a clean separation.
I think this feature is really useful. I use venv inside the root directory of my projects. Normally:
- I create a GitHub repository (with README.md and .gitignore)
- Clone the repo && cd repo
- Then create the venv and activate.
- New requirements.txt (numpy, pandas... kedro)
- kedro new . (for example)
It is not likely that I start my project in an existing one, so problems with pre-existing files is not something I am afraid of.
This is a quite simple feature (when explained) but apparently it has many internal tricky parts. I hope you can solve it.
:+1:
Another user complained about this today.