kedro-viz
kedro-viz copied to clipboard
Kedro new --tools=viz produce files like `stats.json` in the root directory
Description
https://linen-slack.kedro.org/t/16359356/i-ve-just-been-playing-with-kedro-gt-0-19-0-what-is-creating#ac36141d-bfa9-4cd5-84d7-a2a683992687
Context
The root cause is in 0.19 we introduced a new project creation flow. We can put it in a .viz folder and make sure version control ignores it. These files are meant to be internal for Kedro-viz, thus they should not be stored in the project root directory.
I think it should be fine to put it in a hidden folder. And I'd also be in favour of adding stats.json and session_store.db to the .gitignore file. (edited)
Steps to Reproduce
Expected Result
Actual Result
-- If you received an error, place it here.
-- Separate them if you have more than one.
Your Environment
- Kedro version used (
pip show kedroorkedro -V): - Python version used (
python -V): - Operating system and version:
Spotted this in the wild https://gitlab.com/odyssean-institute/asrs-trade-kedro (found by @stichbury)
Note that we should change the default of SESSION_STORE_ARGS, for example
https://github.com/kedro-org/kedro-starters/blob/a829a9b29b9f0020994e71d69dcadd100caf8d7b/spaceflights-pandas-viz/%7B%7B%20cookiecutter.repo_name%20%7D%7D/src/%7B%7B%20cookiecutter.python_package%20%7D%7D/settings.py#L22
Hi Team,
So the idea is to have .viz folder for all the files related to kedro-viz (stats.json, session_store.db) and other files in future.
For the creation of .viz folder -
- At this moment, stats.json is created by a hook on kedro-viz side. We can create the .viz folder during the hook run but the users can disable the hook via settings of the kedro project, so this does not seem a right approach
- Should we make .viz folder a default folder that is created when someone runs
kedro new --tools=viz? This seems to be a more streamlined approach to create the folder. The viz hook can check if the folder exists, if not it can also create the folder during the hook run.
Few questions -
- Since the stats.json file is in the root directory for existing projects, should we also in some way delete the file for the user ? Or document the new changes ?
- Is there a way to create a kedro project without using
kedro newas the documentation states there are several ways to create a kedro project (which I am not aware of). In that case, creating a .viz folder becomes a manual approach and we should document the same.
@noklam do you have any suggestions here ?
Thank you
Hi @astrojuanlu and @noklam,
I have created a naive migration draft PR - https://github.com/kedro-org/kedro-viz/pull/1915/files
What it does -
- This will move the files created for viz inside the .viz folder
- Users need to change the SESSION_STORE_ARGS as
SESSION_STORE_ARGS = {"path": str(Path(__file__).parents[2] / ".viz")}. This needs to be done for all the starters (manual approach and I am not sure of any alternative at this moment) - We defined a constant on viz side
VIZ_METADATA_LOCATIONwhich will be used by viz hook to place stats.json
Questions:
- How can we force users to point SESSION_STORE_ARGS to
parent/.viz? - Is there a better approach than the manual approach mentioned in (2) above for the starters ?
- stats.json is always re-written for kedro run, so the migration is not an issue. But
session_store.dbneeds migration to get the old ET runs. Since this is just a file, we can ask users to copy it, but should we have a migration script which will run based on the location (I can think of a naive approach i.e., if the new location has .viz -> run the migration. Something else would be to look at the parent and check for .db files as we do in _merge method. This assumes user createdparent/.viz).
Please suggest when you have some time
cc: @rashidakanchwala
Thank you
Upon skimming @ravi-kumar-pilla's PR and comment, I wonder if this is really a Kedro-Viz problem. Would this be solved by
- changing the default
SESSION_STORE_ARGSin all our starters, and - adding
.vizto the.gitignorein all starters
?
We need changes on both side:
kedro-vizmust change because right now thestats.jsonis hard coded to project_path/stats.json, which does not respect theSESSION_STORE_ARGS.kedro-starterneed to update the "default" session_store_args to.vizfolder.
Bonus:
In my opinion, it would be better to not have SESSION_STORE_ARGS value in settings.py, we can keep the commented version there for discoveribilty, but kedro-viz should have a default location, if SESSION_STORE_ARGS exists then override.
We are also overloading "SESSION_STORE_ARGS" as "VIZ_METADATA_ARGS" here, the stats.json has nothing to do with session store. For longer term, the question is WHERE is a good place for placing plugin specific config? Whatever we are doing here is only possible with kedro-viz. Let's say if kedro-mlflow also come up with some new project config, it's not possible to update our starters to match it.
For longer term, the question is WHERE is a good place for placing plugin specific config? Whatever we are doing here is only possible with
kedro-viz. Let's say ifkedro-mlflowalso come up with some new project config, it's not possible to update our starters to match it.
agree with @noklam on this. It should be plugin's responsibility of having default locations.
- Ideally, kedro-starters should not have any config variables specific to plugins.
- If there is a need for plugin specific config in kedro, how about having those as inputs while creating the project, something like -
kedro new --tools=viz-> this will prompt for viz specific config file location ? -> if not provided, viz should have default settings for all the config needed. So kedro will not store individual variables but only file path for plugin config if provided. - This would help in dealing with future config variables and allowing users to customize their project.
Thank you
For longer term, the question is WHERE is a good place for placing plugin specific config?
I guess you mean artifacts rather than config, but anyway 👍🏼
Whatever we are doing here is only possible with kedro-viz. Let's say if kedro-mlflow also come up with some new project config, it's not possible to update our starters to match it.
If anything, we can provide a set of good practices for plugins, but it will be difficult to prescribe the general case because we can't anticipate everything the users will want to do.
And yet, our starters are a special case, precisely because we try to provide pre-configured templates for our users. I'm not sure it's possible or reasonable for starters to try to stay away from plugin-specific stuff, while also providing a template that uses said plugin... If we were to provide a spaceflights-mlflow starter we'd be in a similar situation probably.
this will prompt for viz specific config file location
I'd rather not overload users with questions on the wizard, it's on us to provide a sensible default
kedro-viz should have a default location, if SESSION_STORE_ARGS exists then override. We are also overloading "SESSION_STORE_ARGS" as "VIZ_METADATA_ARGS" here, the stats.json has nothing to do with session store.
Good points. Other thoughts?
@ravi-kumar-pilla @rashidakanchwala Can we make it so that Kedro-Viz assumes a default SESSION_STORE_ARGS of
SESSION_STORE_ARGS = {"path": str(Path(__file__).parents[2] / ".viz")}
?
That would be the first step towards fixing the issue. The second one would be to update the starters to actually remove the SESSION_STORE_ARGS from them, so that Viz uses its default.
We are also overloading "SESSION_STORE_ARGS" as "VIZ_METADATA_ARGS" here, the stats.json has nothing to do with session store.
@noklam Could you open a new issue about this describing it in a bit more detail?
Hi @astrojuanlu and @noklam ,
Thanks for the suggestions. Based on the comments, as a first step -
On Kedro-Viz side -
- Create 2 constants at
package/kedro_viz/constants.py-
SESSION_STORE_ARGS = {"path": str(Path(__file__).parents[2] / ".viz")}
VIZ_METADATA_ARGS = {"path": str(Path(__file__).parents[2] / ".viz")}
- If the user provides SESSION_STORE_ARGS, Kedro-Viz will use it to store the session_store.db file. If not, it will use the SESSION_STORE_ARGS["path"] .
- Start writing/reading stats.json to/from VIZ_METADATA_ARGS["path"] with the new release.
- Document about the default SESSION_STORE_ARGS here .
- Document about
stats.jsonbeing created under.vizin kedro docs if they have viz installed and did not disable hooks for plugins. - Starters will not be modified for now.
Let me know your thoughts.
Thank you