incubator-devlake
incubator-devlake copied to clipboard
feat: jupyter playground
This PR is a follow-up on this slack message, referencing this article.
The original goal was to make available the process graph shown in the article, to the community of DevLake users.
However, this couldn't be achieved within the current capabilities of DevLake, as it doesn't qualify as a plugin, and it requires graphviz
as a container-level dependency and a way to provide the data to Grafana and show it in Grafana.
We concluded that the specific visualization would be just one example of a more flexible way to explore the data.
This PR is our idea of how this could be achieved, a "data playground" using the strengths of python
with pandas
in jupyter
.
Potential next step
Using this Jupyter playground for data exploration requires a clone of the repo and a local development setup. It is also possible to add a Jupyter server as another (optional) container to the docker-compose set-up. This way the notebooks could be used in a browser.
Not yet included
- Running the tests in a GitHub action.
- Dynamic query for issue data for the
status transition graph
. Currently, it queries all the issues, not a certain scope. - A data plotting library. For the
status transition graph
we usegraphviz
to plot the graph, and this is a specific requirement for this data structure. And in thetemplate
example the data is just printed in a table. We have been usingplotly
in our Jupyter notebooks, for visualizing data, however, as we didn't include an example yet, we also didn't include the dependency. - Postgres support (no dependency on a client or connector has been added.)
It's a plugin or a feature? How can we use it ? I think some addtional documents are needed.
It's a plugin or a feature? How can we use it ? I think some addtional documents are needed.
@d4x1 We updated the PR with a description and also documentation within the change. I hope this clarifies our goals!
Adding: proposing a change to easily filter on issue type, dates and project key.
Added the following (pairing with @jochumb), so that a user can change whether to see the avg, mean, IQR or minmax as default, showing them all in the tooltip.
Give me some time, and I'll review this PR.
- Here is a new repo https://github.com/apache/incubator-devlake-playground, it's created for this playground. You can remove codes to this repo. DevLake repo is becoming more and more complicated and playground is an independent part. With a standalone repo, it can be updated conveniently.(For example, devlake is still using Python3.9, which is outdated.)
- I haven't run your code locally, but I have reviewed them, you can see the comments on codes.
Thanks for your contribution. I think adding jupyter playground deserves a new blog post in DevLake's official website.
- Here is a new repo https://github.com/apache/incubator-devlake-playground, it's created for this playground. You can remove codes to this repo. DevLake repo is becoming more and more complicated and playground is an independent part. With a standalone repo, it can be updated conveniently.(For example, devlake is still using Python3.9, which is outdated.)
- I haven't run your code locally, but I have reviewed them, you can see the comments on codes.
Thanks for your contribution. I think adding jupyter playground deserves a new blog post in DevLake's official website.
thanks! I think moving to a seperate repository makes sense (need to keep in mind there might be some hard coupling by the data model - when it comes to releasing and versioning perhaps).
The repo is empty I think it needs a first commit because we can't fork it or open a PR:
@lenntt https://github.com/apache/incubator-devlake-playground is not empty now.(I haven't noticed that empty repos cannot be forked.)
@lenntt https://github.com/apache/incubator-devlake-playground is not empty now.(I haven't noticed that empty repos cannot be forked.) Thanks, https://github.com/apache/incubator-devlake-playground/pull/1 First PR is opened :)