github-explorer icon indicating copy to clipboard operation
github-explorer copied to clipboard

How often dataset updated?

Open ivbeg opened this issue 2 years ago • 7 comments

Hi!

The last publication about Github Explorer was in December 2020, and I am just curious, Is the dataset updated since that time? I do regular research about government open-source code, and right now, I use GitHub API directly, but it would be great to use gh-api instead.

Best Regards, Ivan

ivbeg avatar Dec 16 '21 08:12 ivbeg

The interactive dataset is updated every 10 minutes. And the data dumps (.xz archives) are not updated.

alexey-milovidov avatar Dec 16 '21 21:12 alexey-milovidov

The article itself is not updated, maybe we can do annual update.

alexey-milovidov avatar Dec 16 '21 21:12 alexey-milovidov

Thanks, a lot, sounds great! I think it would be great for other users too if update schedule will be mentioned in Github README.md or https://ghe.clickhouse.tech/ website.

About usage of the dataset, I am working on relaunch of Open source government observatory project https://data.world/ibegtin/open-source-government-project. It's rating of government openness based on open source activity of government agencies. It uses government.github.com list of government orgs on github with some additions and calculates country and country groups levels statistics on amount of published code, activites, forks, stars, active developers and e.t.c. for each country.

Since orgs manually mapped to countries it has some limitation of not found orgs related to countries, but still it's quite useful.

I will try to use Github Explorer API than and return with feedback and will be happy to cooperate if some calculations could be made directly other GE database.

ivbeg avatar Dec 17 '21 08:12 ivbeg

I've uploaded the latest dump: https://datasets.clickhouse.com/github_events/tsv/github_events_v3.tsv.xz It contains the data up to today - now 5.4 billion events, 200 GB.

alexey-milovidov avatar Dec 15 '22 23:12 alexey-milovidov

Great! Thanks a lot!

ivbeg avatar Dec 20 '22 08:12 ivbeg

Hi Alexey,

Is the pipeline still working? This query stops at Jan 5.

craigbox avatar Jan 28 '24 02:01 craigbox

@craigbox, Hi!

I decided to rewrite the update script to pure SQL for the sake of testing, and it was paused for a few weeks. See https://github.com/ClickHouse/github-explorer/pull/20

I've deployed the new script, so it should continue being updated.

alexey-milovidov avatar Jan 28 '24 17:01 alexey-milovidov