open-digger icon indicating copy to clipboard operation
open-digger copied to clipboard

[Cron] could you generate data for OpenGalaxy?

Open tyn1998 opened this issue 3 years ago • 13 comments

Description

Reference: https://github.com/X-lab2017/open-galaxy/issues/34

Hi guys, may I request for a cron task generating data for OpenGalaxy? If you can help, please describe how you will have filtered nodes and links to decrease the size of the graph data when the PR is created.

Cron Expression

every month

tyn1998 avatar Feb 22 '23 02:02 tyn1998

/self-assign

I can do this for you. To generate OpenGalaxy global data for every month, I think we can reuse the condition of basic metrics export task which is openrank > e.

We can export all the repos and users nodes with monthly OpenRank value larger than e and activity larger than 2 to avoid too many edges.

With the value above, we can get a graph with 94,789 nodes and 133,960 edges for 2023-01 which is a desirable graph size.

And also I will formalize the edge length into 10 - 30 due to the activity score so the graph will be rendered to OpenGalaxy in a proper way.

A 3,000 iteration layout calculation process will be used to generate position in 3D space, I will try my best to give a continuous position layout result.

frank-zsy avatar Feb 22 '23 03:02 frank-zsy

I think we can also export label data to a new file so you can use in OpenGalaxy to render different color for the nodes.

frank-zsy avatar Feb 22 '23 03:02 frank-zsy

Hi @frank-zsy, thank you for the explanation on export details.

By "also export label data", what do you mean? Is it another file different from https://oss.x-lab.info/open_galaxy/v2/labels.json?

tyn1998 avatar Feb 22 '23 04:02 tyn1998

The label data here means the label data in OpenDigger, like if the repo is from a company or a foundation. Currently we have more than 10,000 repos and 380 orgs with label, so it will cover lots of the repo nodes and good for color rendering.

frank-zsy avatar Feb 22 '23 06:02 frank-zsy

That sounds great!

tyn1998 avatar Feb 22 '23 06:02 tyn1998

I would like to generate all the data for OpenGalaxy by month from 201501 to 202301 with continuous layout positions.

And I will upload the data to OSS under folders named 201501 - 202301, which means you can set ?v=201812 in URL to load the data of 201812. And set the default version to latest month, like 202301 for now.

Does this make sense to you? @tyn1998

frank-zsy avatar Feb 23 '23 06:02 frank-zsy

Is yyyy-mm a valid folder name and a valid url param? If so, I prefer yyyy-mm.

tyn1998 avatar Feb 23 '23 06:02 tyn1998

OK, I think it is

frank-zsy avatar Feb 23 '23 06:02 frank-zsy

@tyn1998 Do you think you will put more effort in OpenGalaxy, I found it is really hard to give a continuous layout for 3d galaxy, is data in 2023-01 enough for now? I can not find a proper way to generate the data.

The parameters we should consider are:

  • How to set the bounds due to current nodes' count or total OpenRank, with different nodes' count, the galaxy size should be different.
  • How to determine whether the layout calculation is converged or not? Currently we use ngaph.offline.layout which is an offline tool to calculate the layout by iterations count but we can not tell if the layout has been converged.
  • How to check if the layouts are continuous or not. Even if I use last month layout positions as the initial positions for next month calculation. Still I can not know if the layouts generated are continuous or not. In ECharts force layout graph components, the graph may rotate for several rounds before it converged. So maybe the layouts are not continuous but I can not tell.

So I think this will be a long term task to generate layouts for timeline.

frank-zsy avatar Feb 26 '23 02:02 frank-zsy

Hi, @frank-zsy, thanks a lot for your effort!

The knowledges and skills for generating continuous layouts for OpenGalaxy are indeed complex. I agree with you that more time and energy should be involved to complete the challenge.

For now, the data of 2023-01 is enough for building a demo application.

tyn1998 avatar Feb 26 '23 03:02 tyn1998

@tyn1998 Thanks, I will look into the details in the future.

frank-zsy avatar Feb 26 '23 03:02 frank-zsy

Hi @frank-zsy, could you export OpenGalaxy data of 2023-02 and set it as the default data?

tyn1998 avatar Mar 18 '23 02:03 tyn1998

Hi @frank-zsy, could you export OpenGalaxy data of 2023-02 and set it as the default data?

Hello @frank-zsy, could you export the data of 2023-09 and make it default?

tyn1998 avatar Oct 04 '23 08:10 tyn1998