kedro Create a docs page about adding code beyond starter files

Description

Part of progress towards: #3418

Per this discussion in Slack with @noklam, I took a task of creating a guide on how to add code beyond starter files in Kedro.

P.S. It's my first contribution to the project, so I'd be happy to iterate as much as needed until it satisfies core team's needs. And appreciate any feedback.

Development notes

[x] Built docs locally to test that hyperlinks and formatting work as expected.

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

[x] Read the contributing guidelines
[x] Signed off each commit with a Developer Certificate of Origin (DCO)
[x] Opened this PR as a 'Draft Pull Request' if it is work-in-progress
[x] Updated the documentation to reflect the code changes
[x] Added a description of this change in the RELEASE.md file
[ ] Added tests to cover my changes
[x] Checked if this change will affect Kedro-Viz, and if so, communicated that with the Viz team

May 06 '24 06:05 yury-fedotov

Rendered version: https://kedro--3852.org.readthedocs.build/en/3852/kedro_project_setup/code_beyond_starter_files.html

May 13 '24 21:05 yury-fedotov

@noklam, @astrojuanlu, could you please have a look?

May 13 '24 21:05 yury-fedotov

Hey @merelcht - thanks for comments again, I just finished implementing them.

QQ also about the vale usage in CI. I noticed that it produces some warnings on this PR, but apparently on many other existing MD files. So it's like a soft CI check?

May 17 '24 01:05 yury-fedotov

QQ also about the vale usage in CI. I noticed that it produces some warnings on this PR, but apparently on many other existing MD files. So it's like a soft CI check?

Yes, it's a soft check. I usually treat it as a guidance on grammar/word usage. It does also check for spelling mistakes, which is helpful. But you definitely don't need to address it all.

May 17 '24 07:05 merelcht

I am happy to review/edit this if you decide to retain as docs, but please let me know what you decide about this:

I like a lot of the recommendation here, but I hesitate to make it an official recommendation without the team discussing a bit more. Should we extract the content and make it a guest blog post instead? @astrojuanlu

LMK if you want to have a blog post instead, which sounds like a great idea, and then you could extract from the post if you wanted a longer-term piece of content that you maintain officially as docs. I wouldn't be able to craft the post without some notice and a ticket, but would definitely support the idea and help with later stages to edit and post, if you need it.

Just ping me again when you decide how to go forward!

May 20 '24 12:05 stichbury

Hi @stichbury, we've discussed this in backlog grooming and decided we'd like to get it in as docs. I really appreciate any help you can give getting this in shape 🙏

May 20 '24 14:05 merelcht

Hi @stichbury, we've discussed this in backlog grooming and decided we'd like to get it in as docs. I really appreciate any help you can give getting this in shape 🙏

For sure! I'm not sure if I'll be able to get to this in this sprint, but I'll add to my list; otherwise it'll be next week. To save holding you up, probably best to get it into the state where you want me to do a final review/update and let me know it is ready.

May 20 '24 16:05 stichbury

Thanks for the comments team! I'll get to them next week and come back.

May 24 '24 00:05 yury-fedotov

@yury-fedotov @astrojuanlu Can you ping me when this is ready for my review and I'll get you any final feedback on the candidate draft.

May 29 '24 11:05 stichbury

There are a couple of outstanding comments waiting for response from @noklam and after that I think we should be good for a next round of reviews. Thanks for your patience @yury-fedotov 🙏🏼

May 29 '24 12:05 astrojuanlu

I'll ping you too @stichbury . It's not ready yet.

May 29 '24 15:05 yury-fedotov

So I'm just seeing this for the fist time. Thank you for the push @yury-fedotov. I feel quite strongly that we're overcomplicating things:

Problem:

Business logic grows in complexity and should live and be tested outside of your core Kedro project.
You should also be aspiring to achieve the general principle of loose coupling, high cohesion
Python paths and packaging in general is a pain for many novice programmers.

What we're talking about here isn't Kedro specific at all, so there's an argument we shouldn't be writing extensive docs and perhaps would be better placed to link to relevant resources.

If we are to provide a solution, this would be my approach:

This is a solved problem at QuantumBlack with tooling to create and manage python packages within a monrepo. The monorepo section as it's written today touches on some good points, but in my opinion we should pivot this section into explaining to users how to achieve this best, the whole section about pipeline_registry.py and settings.py feels way off piste to me.
- We should explain how to create a minimal python package, the Dagster folks have some great accessible resources here (1) (2)
- We should explain the virtues of writing and testing business logic, uncoupled from pipeline flow logic.
- This is the pattern we should be encouraging and introducing concepts like pip install path/to/package -e feels like something we should be explaining here.

May 29 '24 17:05 datajoely

Thanks for review @datajoely !

I agree with what you said. I think the current content reflects that I had a different user problem in mind while writing this, and that caused the content to diverge from what you would've expected for the problem definition you mentioned.

Here's the difference.

You definition (copy pasting)

Business logic grows in complexity and should live and be tested outside of your core Kedro project.
You should also be aspiring to achieve the general principle of loose coupling, high cohesion
Python paths and packaging in general is a pain for many novice programmers.

Those problems are undoubtedly relevant IMO, I agree.

What I was trying to cover

Beginner users are sometimes unsure where to put the code as the project grows. Can it only be in nodes.py and pipeline.py? Would it work if I create a pipeline directory manually instead of kedro pipeline create? We indeed know answers to those, but the guide was trying to give those to people who are very new to Kedro.
It can also be challenging to grasp how would Kedro work if project root != git repo root.

It is actually an open question if problems I refer to above are relevant, but as I understood from 3418, they are.

Proposed next steps

I'm totally happy to adjust the content as much as needed, including ground up rewrite if it would make it more useful. But would appreciate a coordinated maintainers opinion on what should this content be.

@datajoely , @merelcht , @astrojuanlu , @noklam do you mind discussing internally (or I am happy to join too) how does the core team see the resolution of 3418 and let me know?

Jun 03 '24 16:06 yury-fedotov

While this doesn't entirely close #3418, I do think it's a valuable addition to our docs and I would very much like this to go in. @noklam Could you have another look and comment on what you think still needs to change?

Jul 09 '24 09:07 merelcht

Lol, I wanted to resolve merge conflict but hit very weird pre-commit issue, see here: https://github.com/kedro-org/kedro/issues/4004

I closed it though as not sure why that happens, maybe specific to my computer.

Resolved through GitHub UI since it was a one line thing in RELEASE.md

Jul 11 '24 03:07 yury-fedotov

Hey team, I've addressed recent comments. Could you have another look please?

I also went through the Language Linter for Kedro Docs error logs, and not sure how to proceed, since it does propose a few changes to this file, but it does that for most of the docs files in general. I understood it's used as a soft check?

Sep 07 '24 03:09 yury-fedotov

kedro kedro copied to clipboard

Create a docs page about adding code beyond starter files

Description

Development notes

Developer Certificate of Origin

Checklist

You definition (copy pasting)

What I was trying to cover

Proposed next steps

kedro
kedro copied to clipboard