kedro icon indicating copy to clipboard operation
kedro copied to clipboard

Run `kedro new` without creating a new directory

Open jaklan opened this issue 4 years ago • 31 comments

Description & Context

Currently, when running kedro new I have to specify a folder where my project is created. It does't make sense when I use venv or Poetry with in-project venv, because I have to create a directory by myself anyway to init a venv and install Kedro there. As a result, I have to manually move the new project to the upper folder with mv ~/repos/project_name/project_name/* ~/repos/project_name/.

Possible Implementation

Add flag to skip creating a folder.

Possible Alternatives

Ask about it during init.

jaklan avatar Feb 01 '21 09:02 jaklan

I find it a bit awkward as well that you need to have kedro installed before you have created your project and likely have created your environment. Is it possible to give a cookie-cutter command alternative?

Potentially related to this chicken/egg situation of needing to have kedro installed (potentially globally) for project setup that may or may not be the version you are looking for in the project. Is it possible to achieve the same results of kedro install with pip install -e . Users may be onboarding to a new project in 0.16.x and simply grab the latest version off of pypi (pip install kedro) before running kedro install.

Other than pip install -r requirements.txt or reading requirements.txt for the specific version of kedro how should users know how to setup the project for local development?

WaylonWalker avatar Feb 04 '21 03:02 WaylonWalker

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Apr 12 '21 15:04 stale[bot]

Bump

jaklan avatar Apr 12 '21 15:04 jaklan

Hey @jaklan thanks for the suggestion! It sounds like it might be related to https://github.com/cookiecutter/cookiecutter/issues/909 & https://github.com/cookiecutter/cookiecutter/pull/907 - I suggest you also express your interest there, maybe it'll speed it along for the next cookiecutter release. I think it'd be better for them to address it rather than us building a custom solution, but I'll leave this open for a while to gauge interest on the feature request. 🤔

lorenabalan avatar Apr 20 '21 16:04 lorenabalan

@WaylonWalker yes that's a valid point, though I'm not sure how it relates to the original question? It feels to me like it deserves its own separate discussion.

lorenabalan avatar Apr 20 '21 16:04 lorenabalan

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jun 19 '21 17:06 stale[bot]

I was oblivious to this issue because I was using out-of-tree environments with conda/mamba, but a colleague that just tried to use Kedro with venv experienced the same confusion. Here is some insight into in-tree (local) vs out-of-tree (global) environment workflows https://snarky.ca/classifying-python-virtual-environment-workflows/

Notice we already have some documentation about using venv or Pipenv instead of conda https://kedro.readthedocs.io/en/stable/faq/faq.html#can-i-create-a-virtual-environment-without-conda although I think we could make it a bit more clear (https://github.com/kedro-org/kedro/issues/2360).

I think it would be good to tag this as "Won't fix" (since it's unlikely that the upstream issue in cookiecutter is ever addressed). I'm going to go ahead and do it, otherwise folks feel free to reverse my decision cc @AntonyMilneQB

astrojuanlu avatar Feb 24 '23 09:02 astrojuanlu

Another side effect of not being able to init a Kedro project in the current directory: users create weird structures with 2 READMEs for nothing, like https://github.com/pablovdcf/TFM_HADO_Cares

I know this issue was closed 2+ years ago but honestly it was basically the first pain point I encountered https://github.com/kedro-org/kedro/issues/2360 and it keeps coming up over and over again. I'm reopening so that we can reprioritize.

astrojuanlu avatar Oct 23 '23 12:10 astrojuanlu

This is needed to have venv/virtualenv as first-class citizens in the Kedro installation instructions I think (otherwise the workflow is too weird).

An informal poll run by @/brettcannon showed that the majority of developers store virtual environments next to their project code. https://snarky.ca/classifying-python-virtual-environment-workflows/

image

astrojuanlu avatar Oct 27 '23 14:10 astrojuanlu

I wonder what an equivalent poll result would look like for Kedro users. I suspect it would be much more biased towards global/central directory due to the prevalence of conda.

The point definitely still stands and it would be great if there were a better way to handle in-tree environments, but I would just be cautious assigning priority based on the above poll. It might be worth even doing the same sort of poll for kedro users (or potential kedro users I guess, because there's a selection bias as those who end up using kedro are more likely to be those who had a smooth experience using global environments). Maybe some such poll already exists for data scientists/similar.

antonymilne avatar Oct 27 '23 19:10 antonymilne

@antonymilne there's no need to overthink that topic and refer to any polls - Kedro should allow people to generate a project in the current directory, period. In-project venvs are absolutely common in the Python ecosystem and they should be simply supported. Especially taking into consideration the fact you can solve the issue with one command moved to Kedro internals.

jaklan avatar Oct 27 '23 19:10 jaklan

How would such function implemented? If i understand correctly cookiecutter still don't support this today, so it need to be implemented in Kedro.

Do we need to handle edge cases with existing files? Assuming an empty folder will be easy, would this be good enough?

noklam avatar Oct 27 '23 22:10 noklam

@noklam I have already answered that in the initial issue (more than 2 years ago btw...) - you can mimic mv inside Kedro.

Assuming an empty folder will be easy, would this be good enough?

Of course no, because the whole discussion is about generating a project in the current directory, because you have .venv already created there (and probably other files, like the Poetry-specific ones, as well).

Do we need to handle edge cases with existing files?

You can simply display a proper warning when running kedro new with e.g. --cwd flag and wait for user confirmation then. Of course it can be more sophisticated and analyse if any files will be overwritten or not, but that's the easiest approach to start with.

Generally, I see there's also another issue about Poetry itself: https://github.com/kedro-org/kedro/issues/1722, so you need to implement a mechanism to move files anyway if you really want to support it. But there's also another approach - utilise a different, globally installed, CLI tool to initalise Kedro projects - e.g. kedro-starter. This way you avoid chicken & egg problem.

jaklan avatar Oct 28 '23 14:10 jaklan

Indeed, cookiecutter does not support this, nothing has changed since https://github.com/kedro-org/kedro/issues/681#issuecomment-823427629

Notice that cookiecutter already has "override/fail if exists" functionality, it's just that it always creates a subdirectory. We'll probably have to move files around.

We could start with a conservative stance, like "if any of the files I'm going to create already exists, fail". But this is easier said than done, because then kedro new would need knowledge of the cookiecutter structure https://github.com/cookiecutter/cookiecutter/issues/1004

It's actually easier to blindly copy everything over, but this poses data loss risk.

copier handles this beautifully, but refactoring kedro away from cookiecutter would be painful:

IMG_20231028_164656

I don't think this is impossible though. Once we agree this is needed, we'll have to carefully think the path of least resistance.

astrojuanlu avatar Oct 28 '23 14:10 astrojuanlu

I never addressed @antonymilne 's point:

I wonder what an equivalent poll result would look like for Kedro users. I suspect it would be much more biased towards global/central directory due to the prevalence of conda.

And also because we are hiding our venv instructions behind a collapsible menu + the ergonomics are really weird:

image

so I wouldn't be surprised if the current users got sort of used to it. But that's the key trap we have to avoid, and you spelled it out already:

(or potential kedro users I guess, because there's a selection bias as those who end up using kedro are more likely to be those who had a smooth experience using global environments)

I can run an informal poll in Slack and see what people think.

astrojuanlu avatar Oct 31 '23 17:10 astrojuanlu

On one hand, our over-reliance on conda creates some trouble for certain users. For example, here is a user that is struggling to install a compatible version of Kedro on Python 3.8 because of the pip and setuptools constraints https://linen-slack.kedro.org/t/16034230/hello-i-have-created-a-kedro-matlab-custom-dataset-which-i-w#20e4ffe4-e697-47fa-a722-d74a752b7bed

On the other hand, as much as I'd like this to happen, I'm reconsidering how impactful the change would be, because according to an informal survey I ran on Slack, the main annoyance seems to be that users have a "global" Kedro and a project-specific Kedro https://linen-slack.kedro.org/t/16040768/u05bdslpj72-finally-gave-the-steps-in-https-kedro-org-slack-#db2c34ed-43bf-4479-839c-5a4fb4154a10

So much so, that some users don't use kedro new at all and rely on cookiecutter directly ❗ https://linen-slack.kedro.org/t/16031681/hello-here-wave-skin-tone-3-i-come-with-a-thorny-question-to#e0833bfb-336a-4e7c-acef-948cc0146694 cc @inigohidalgo and this is tricky because I don't know what are the implications of the new add-ons flow on this workflow.

However, this points to a new interesting direction that might have even more impact: making kedro-new a non-mandatory plugin. That's a large change that will need to be discussed in its own issue.

astrojuanlu avatar Nov 07 '23 08:11 astrojuanlu

Issue about improving our installation documentation https://github.com/kedro-org/kedro/issues/3281

astrojuanlu avatar Nov 07 '23 09:11 astrojuanlu

Notice that, if kedro new could init the current directory, in principle users wouldn't need a global Kedro, but there would still be two installation steps:

  1. Create project directory mkdir spaceflights && cd spaceflights
  2. Create venv python -m venv .venv && source .venv/bin/activate
  3. Install Kedro pip install kedro
  4. kedro new --outdir . (or whatever)
  5. Install project dependencies pip install -r requirements.txt

At least 2 users consider this second installation step confusing but there's not much else that can be done I believe https://linen-slack.kedro.org/t/16040768/u05bdslpj72-finally-gave-the-steps-in-https-kedro-org-slack-#8a7c5923-7fee-4d46-9229-1ca566b248e8

astrojuanlu avatar Nov 07 '23 09:11 astrojuanlu

Is there a route where we make pipx the recommended install path?

datajoely avatar Nov 07 '23 09:11 datajoely

I don't think so. The problem here is that there are 2 CLI commands that have conflicting purposes:

  • kedro new is unencumbered by project dependencies because the project doesn't exist by the time it's called, and also it doesn't change a lot so there's no point in upgrading.
  • kedro run (and anything related to the actual Kedro project) require the project dependencies to work (hence mandate that kedro is installed in the same environment) and also it should be up to date to benefit from new features, bug fixes, and performance improvements

I don't think there's a way to reconcile these two sets of requirements.

astrojuanlu avatar Nov 07 '23 10:11 astrojuanlu

Slightly out there suggestion - If we had a web ui on the Website for the project add-ons workflow we could then spit out a folder that covers the kedro new part

datajoely avatar Nov 07 '23 10:11 datajoely

Let me just bump an idea mentioned above:

But there's also another approach - utilise a different, globally installed, CLI tool to initalise Kedro projects - e.g. kedro-starter. This way you avoid chicken & egg problem.

I believe it would solve many of the issues discussed in that thread. You could install kedro-starter with e.g. pipx then and it would be a quite similar experience to using cookiecutter directly (in other words, you could treat kedro-starter as a wrapper on top of cookiecutter).

jaklan avatar Nov 07 '23 11:11 jaklan

@jaklan that would overcome the main limitation I see from installing kedro in pipx, which is the conflicting versions between the local .venv kedro and the global pipx kedro. If the tool only takes up the kedro-starter or kedro-new namespace and leaves kedro open to only be the local .venv version that would be a clean separation.

inigohidalgo avatar Nov 07 '23 12:11 inigohidalgo

I think this feature is really useful. I use venv inside the root directory of my projects. Normally:

  • I create a GitHub repository (with README.md and .gitignore)
  • Clone the repo && cd repo
  • Then create the venv and activate.
  • New requirements.txt (numpy, pandas... kedro)
  • kedro new . (for example)

It is not likely that I start my project in an existing one, so problems with pre-existing files is not something I am afraid of.

This is a quite simple feature (when explained) but apparently it has many internal tricky parts. I hope you can solve it.

:+1:

martxelo avatar Feb 17 '24 07:02 martxelo

Another user complained about this today.

astrojuanlu avatar Jul 30 '24 14:07 astrojuanlu