kedro
kedro copied to clipboard
Create a docs page about adding code beyond starter files
Description
Part of progress towards: #3418
Per this discussion in Slack with @noklam, I took a task of creating a guide on how to add code beyond starter files in Kedro.
P.S. It's my first contribution to the project, so I'd be happy to iterate as much as needed until it satisfies core team's needs. And appreciate any feedback.
Development notes
- [x] Built docs locally to test that hyperlinks and formatting work as expected.
Developer Certificate of Origin
We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by
line in the commit message. See our wiki for guidance.
If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.
Checklist
- [x] Read the contributing guidelines
- [x] Signed off each commit with a Developer Certificate of Origin (DCO)
- [x] Opened this PR as a 'Draft Pull Request' if it is work-in-progress
- [x] Updated the documentation to reflect the code changes
- [x] Added a description of this change in the
RELEASE.md
file - [ ] Added tests to cover my changes
- [x] Checked if this change will affect Kedro-Viz, and if so, communicated that with the Viz team
Rendered version: https://kedro--3852.org.readthedocs.build/en/3852/kedro_project_setup/code_beyond_starter_files.html
@noklam, @astrojuanlu, could you please have a look?
Hey @merelcht - thanks for comments again, I just finished implementing them.
QQ also about the vale
usage in CI. I noticed that it produces some warnings on this PR, but apparently on many other existing MD files. So it's like a soft CI check?
QQ also about the
vale
usage in CI. I noticed that it produces some warnings on this PR, but apparently on many other existing MD files. So it's like a soft CI check?
Yes, it's a soft check. I usually treat it as a guidance on grammar/word usage. It does also check for spelling mistakes, which is helpful. But you definitely don't need to address it all.
I am happy to review/edit this if you decide to retain as docs, but please let me know what you decide about this:
I like a lot of the recommendation here, but I hesitate to make it an official recommendation without the team discussing a bit more. Should we extract the content and make it a guest blog post instead? @astrojuanlu
LMK if you want to have a blog post instead, which sounds like a great idea, and then you could extract from the post if you wanted a longer-term piece of content that you maintain officially as docs. I wouldn't be able to craft the post without some notice and a ticket, but would definitely support the idea and help with later stages to edit and post, if you need it.
Just ping me again when you decide how to go forward!
Hi @stichbury, we've discussed this in backlog grooming and decided we'd like to get it in as docs. I really appreciate any help you can give getting this in shape 🙏
Hi @stichbury, we've discussed this in backlog grooming and decided we'd like to get it in as docs. I really appreciate any help you can give getting this in shape 🙏
For sure! I'm not sure if I'll be able to get to this in this sprint, but I'll add to my list; otherwise it'll be next week. To save holding you up, probably best to get it into the state where you want me to do a final review/update and let me know it is ready.
Thanks for the comments team! I'll get to them next week and come back.
@yury-fedotov @astrojuanlu Can you ping me when this is ready for my review and I'll get you any final feedback on the candidate draft.
There are a couple of outstanding comments waiting for response from @noklam and after that I think we should be good for a next round of reviews. Thanks for your patience @yury-fedotov 🙏🏼
I'll ping you too @stichbury . It's not ready yet.
So I'm just seeing this for the fist time. Thank you for the push @yury-fedotov. I feel quite strongly that we're overcomplicating things:
Problem:
- Business logic grows in complexity and should live and be tested outside of your core Kedro project.
- You should also be aspiring to achieve the general principle of loose coupling, high cohesion
- Python paths and packaging in general is a pain for many novice programmers.
What we're talking about here isn't Kedro specific at all, so there's an argument we shouldn't be writing extensive docs and perhaps would be better placed to link to relevant resources.
If we are to provide a solution, this would be my approach:
- This is a solved problem at QuantumBlack with tooling to create and manage python packages within a monrepo. The monorepo section as it's written today touches on some good points, but in my opinion we should pivot this section into explaining to users how to achieve this best, the whole section about
pipeline_registry.py
andsettings.py
feels way off piste to me.- We should explain how to create a minimal python package, the Dagster folks have some great accessible resources here (1) (2)
- We should explain the virtues of writing and testing business logic, uncoupled from pipeline flow logic.
- This is the pattern we should be encouraging and introducing concepts like
pip install path/to/package -e
feels like something we should be explaining here.
Thanks for review @datajoely !
I agree with what you said. I think the current content reflects that I had a different user problem in mind while writing this, and that caused the content to diverge from what you would've expected for the problem definition you mentioned.
Here's the difference.
You definition (copy pasting)
- Business logic grows in complexity and should live and be tested outside of your core Kedro project.
- You should also be aspiring to achieve the general principle of loose coupling, high cohesion
- Python paths and packaging in general is a pain for many novice programmers.
Those problems are undoubtedly relevant IMO, I agree.
What I was trying to cover
- Beginner users are sometimes unsure where to put the code as the project grows. Can it only be in
nodes.py
andpipeline.py
? Would it work if I create a pipeline directory manually instead ofkedro pipeline create
? We indeed know answers to those, but the guide was trying to give those to people who are very new to Kedro. - It can also be challenging to grasp how would Kedro work if project root != git repo root.
It is actually an open question if problems I refer to above are relevant, but as I understood from 3418, they are.
Proposed next steps
I'm totally happy to adjust the content as much as needed, including ground up rewrite if it would make it more useful. But would appreciate a coordinated maintainers opinion on what should this content be.
@datajoely , @merelcht , @astrojuanlu , @noklam do you mind discussing internally (or I am happy to join too) how does the core team see the resolution of 3418 and let me know?
While this doesn't entirely close #3418, I do think it's a valuable addition to our docs and I would very much like this to go in. @noklam Could you have another look and comment on what you think still needs to change?
Lol, I wanted to resolve merge conflict but hit very weird pre-commit issue, see here: https://github.com/kedro-org/kedro/issues/4004
I closed it though as not sure why that happens, maybe specific to my computer.
Resolved through GitHub UI since it was a one line thing in RELEASE.md
Hey team, I've addressed recent comments. Could you have another look please?
I also went through the Language Linter for Kedro Docs
error logs, and not sure how to proceed, since it does propose a few changes to this file, but it does that for most of the docs files in general. I understood it's used as a soft check?