kedro-viz
kedro-viz copied to clipboard
Refactor - Replace bootstrap(project) with configure project
Description
Development notes
QA notes
Checklist
- [ ] Read the contributing guidelines
- [ ] Opened this PR as a 'Draft Pull Request' if it is work-in-progress
- [ ] Updated the documentation to reflect the code changes
- [ ] Added new entries to the
RELEASE.mdfile - [ ] Added tests to cover my changes
Copy the summary of discussion in Slack here:
After some debugging with @Merel , we found out what's the problem and have a solution now. The story turns out to be quite complicated:
- bootstrap_project was run properly, so kedro-viz CLI is part of KedroCLI
- Multiprocess behaves differently in Mac & Window (spawn process), on Linux it's "fork", so when we are testing things, CI is passing but fails locally.
- "spawn" process will re-import every library, which means PACKAGE_NAME get reset as None The solution is replacing bootstrap_project with configure_project(project_name), the PACKAGE_NAME should be add as an argument to run_server. (Look at ParallelRunner in Kedro to see how this is handle, https://github.com/kedro-org/kedro/blob/da709d4316c141c5a7d6f676a87a5752807b33f4/kedro/runner/parallel_runner.py#L4)
Hi Team,
To resolve https://github.com/kedro-org/kedro-viz/issues/1761, before we spawn a new process for viz, we need to check if the path/current dir has a valid kedro project (we do not want to call configure_project or bootstrap as they do more than what we want at this point). The other 2 alternatives I saw were -
- Either use
_get_project_metadata(Path.cwd())(which raises an exception) -
RuntimeError: Could not find the project configuration file 'pyproject.toml' in /Users/Ravi_Kumar_Pilla/Library/CloudStorage/OneDrive-McKinsey&Company/Documents/Kedro/KedroOrg/kedro-viz. If you have created your project with Kedro version <0.17.0, make sure to update your project template. See https://github.com/kedro-org/kedro/blob/main/RELEASE.md#migration-guide-from-kedro-016-to-kedro-0170 for how to migrate your Kedro project.)
- Use
_find_kedro_projectmethod fromkedro.utils
We can have either of these in kedro_viz.launchers.cli before we start the process
1. # _get_project_metadata(Path.cwd())
2. # project_path = _find_kedro_project(Path.cwd())
# if not project_path:
# display_cli_message(
# "ERROR: Unable to find Kedro project.",
# "red"
# )
# return
Since both of them are private and internal, I checked with @ankatiyar . She suggested - since the logic of _find_kedro_project method is not complex, we can implement similar logic on Kedro-Viz side to avoid coupling with importing a private function.
My thoughts - Kedro-Viz already uses few private methods of Kedro, so we can actually use either of the above 2 approaches (I would recommend using _find_kedro_project and make viz start with sub-directories). At the same time, we can reduce coupling by implementing logic directly on viz side as Ankita suggested.
cc: @rashidakanchwala @astrojuanlu
She suggested - since the logic of _find_kedro_project method is not complex, we can implement similar logic on Kedro-Viz side to avoid coupling with importing a private function.
Any chance to make that function public on Kedro?
we can implement similar logic on Kedro-Viz side to avoid coupling with importing a private function.
Tempting, but I think it should be kedro responsibility to determine what a Kedro project is, and not having Kedro-Viz duplicate logic.
My 2 cents. What do you think @merelcht ?
She suggested - since the logic of _find_kedro_project method is not complex, we can implement similar logic on Kedro-Viz side to avoid coupling with importing a private function.
Any chance to make that function public on Kedro?
we can implement similar logic on Kedro-Viz side to avoid coupling with importing a private function.
Tempting, but I think it should be
kedroresponsibility to determine what a Kedro project is, and not having Kedro-Viz duplicate logic.My 2 cents. What do you think @merelcht ?
I would actually prefer the opposite 😅 I would not be so quick to just make this method public and increasing the public API and would argue that this logic belongs as much to the framework as it does to Kedro Viz. Kedro Viz currently depends on there being a Kedro project, so I feel like it's not that weird to implement logic on the viz side to detect if a Kedro project is present. Using the private method is indeed the least preferred implementation, because of the coupling it introduces.
@merelcht's opinion prevails 😄 I don't have a strong preference
For the record: kedro-mlflow also importing the private API https://github.com/Galileo-Galilei/kedro-mlflow/pull/539/files#diff-1391ca4513935426e2b4545a1f6f052cc12e0814ea9d3b21c8a16aa2d0de6baa
She suggested - since the logic of _find_kedro_project method is not complex, we can implement similar logic on Kedro-Viz side to avoid coupling with importing a private function.
Any chance to make that function public on Kedro?
we can implement similar logic on Kedro-Viz side to avoid coupling with importing a private function.
Tempting, but I think it should be
kedroresponsibility to determine what a Kedro project is, and not having Kedro-Viz duplicate logic. My 2 cents. What do you think @merelcht ?I would actually prefer the opposite 😅 I would not be so quick to just make this method public and increasing the public API and would argue that this logic belongs as much to the framework as it does to Kedro Viz. Kedro Viz currently depends on there being a Kedro project, so I feel like it's not that weird to implement logic on the viz side to detect if a Kedro project is present. Using the private method is indeed the least preferred implementation, because of the coupling it introduces.
Thank you @merelcht for responding. While I agree the logic is also a responsibility of plugin developer, on maintenance front where in future if the logic of finding a kedro project changes (may be kedro framework team decides on having a new template), the plugin developer should also update the logic which duplicates the code twice.
I do not completely understand why should this util function is not made public. Could you please let me know your thoughts on this. Thank you
I do not completely understand why should this util function is not made public. Could you please let me know your thoughts on this. Thank you
In my opinion, these kind of utility functions that hold a lot of logic for internal specifics of Kedro just don't belong in a public API. Making something like this public would mean that if we ever decide to change it we'd have to do a major breaking release on Kedro. It would make it very hard to change internal logic without breaking releases. IMO breaking changes should serve a clear and useful purpose for the majority of our users, who are first and foremost, the Framework users and these users generally don't use or care at all for internal utility functions.
However, I do think we need to make a proper effort of de-coupling Kedro Viz and Kedro where possible and critically review how the Kedro API is used in Viz to assess how we can make sustainable version on the Kedro side to prevent situations like the one here and what we're seeing with the transcoding imports as well.
Thanks, @merelcht. I also believe these utility functions don't change significantly. Therefore, if we duplicate the above two functions, we could do the same for transcoding functions. Besides these, I don't think there are any other private functions that Kedro-viz imports.
Of course, we still import public classes and use private methods of those classes. But that's a more difficult problem to solve and need to be solved another way.
To resolve https://github.com/kedro-org/kedro-viz/issues/1761, before we spawn a new process for viz, we need to check if the path/current dir has a valid kedro project
Can you explains why is this the case? Reading the PR, it's still using the configure_project way that I suggested. I am getting lost in the discussion since the PR was created so long ago, what's changed? Can you list out the requirements if this solution actually doesn't work?
For the record: kedro-mlflow also importing the private API https://github.com/Galileo-Galilei/kedro-mlflow/pull/539/files#diff-1391ca4513935426e2b4545a1f6f052cc12e0814ea9d3b21c8a16aa2d0de6baa
Why do kedro-mlflow need that? This may have changed since Kedro CLI can be run anywhere inside a repository now.
To resolve #1761, before we spawn a new process for viz, we need to check if the path/current dir has a valid kedro project
Can you explains why is this the case? Reading the PR, it's still using the
configure_projectway that I suggested. I am getting lost in the discussion since the PR was created so long ago, what's changed? Can you list out the requirements if this solution actually doesn't work?For the record: kedro-mlflow also importing the private API https://github.com/Galileo-Galilei/kedro-mlflow/pull/539/files#diff-1391ca4513935426e2b4545a1f6f052cc12e0814ea9d3b21c8a16aa2d0de6baa
Why do kedro-mlflow need that? This may have changed since Kedro CLI can be run anywhere inside a repository now.
I have updated the description and this PR aims at moving away from bootstrap_project as you suggested @noklam . I have another PR in draft which solves #1761 .