[Cron] could you generate data for OpenGalaxy?
Description
Reference: https://github.com/X-lab2017/open-galaxy/issues/34
Hi guys, may I request for a cron task generating data for OpenGalaxy? If you can help, please describe how you will have filtered nodes and links to decrease the size of the graph data when the PR is created.
Cron Expression
every month
/self-assign
I can do this for you. To generate OpenGalaxy global data for every month, I think we can reuse the condition of basic metrics export task which is openrank > e.
We can export all the repos and users nodes with monthly OpenRank value larger than e and activity larger than 2 to avoid too many edges.
With the value above, we can get a graph with 94,789 nodes and 133,960 edges for 2023-01 which is a desirable graph size.
And also I will formalize the edge length into 10 - 30 due to the activity score so the graph will be rendered to OpenGalaxy in a proper way.
A 3,000 iteration layout calculation process will be used to generate position in 3D space, I will try my best to give a continuous position layout result.
I think we can also export label data to a new file so you can use in OpenGalaxy to render different color for the nodes.
Hi @frank-zsy, thank you for the explanation on export details.
By "also export label data", what do you mean? Is it another file different from https://oss.x-lab.info/open_galaxy/v2/labels.json?
The label data here means the label data in OpenDigger, like if the repo is from a company or a foundation. Currently we have more than 10,000 repos and 380 orgs with label, so it will cover lots of the repo nodes and good for color rendering.
That sounds great!
I would like to generate all the data for OpenGalaxy by month from 201501 to 202301 with continuous layout positions.
And I will upload the data to OSS under folders named 201501 - 202301, which means you can set ?v=201812 in URL to load the data of 201812. And set the default version to latest month, like 202301 for now.
Does this make sense to you? @tyn1998
Is yyyy-mm a valid folder name and a valid url param? If so, I prefer yyyy-mm.
OK, I think it is
@tyn1998 Do you think you will put more effort in OpenGalaxy, I found it is really hard to give a continuous layout for 3d galaxy, is data in 2023-01 enough for now? I can not find a proper way to generate the data.
The parameters we should consider are:
- How to set the bounds due to current nodes' count or total OpenRank, with different nodes' count, the galaxy size should be different.
- How to determine whether the layout calculation is converged or not? Currently we use
ngaph.offline.layoutwhich is an offline tool to calculate the layout by iterations count but we can not tell if the layout has been converged. - How to check if the layouts are continuous or not. Even if I use last month layout positions as the initial positions for next month calculation. Still I can not know if the layouts generated are continuous or not. In ECharts force layout graph components, the graph may rotate for several rounds before it converged. So maybe the layouts are not continuous but I can not tell.
So I think this will be a long term task to generate layouts for timeline.
Hi, @frank-zsy, thanks a lot for your effort!
The knowledges and skills for generating continuous layouts for OpenGalaxy are indeed complex. I agree with you that more time and energy should be involved to complete the challenge.
For now, the data of 2023-01 is enough for building a demo application.
@tyn1998 Thanks, I will look into the details in the future.
Hi @frank-zsy, could you export OpenGalaxy data of 2023-02 and set it as the default data?
Hi @frank-zsy, could you export OpenGalaxy data of
2023-02and set it as the default data?
Hello @frank-zsy, could you export the data of 2023-09 and make it default?