github-explorer
github-explorer copied to clipboard
How often dataset updated?
Hi!
The last publication about Github Explorer was in December 2020, and I am just curious, Is the dataset updated since that time? I do regular research about government open-source code, and right now, I use GitHub API directly, but it would be great to use gh-api instead.
Best Regards, Ivan
The interactive dataset is updated every 10 minutes. And the data dumps (.xz archives) are not updated.
The article itself is not updated, maybe we can do annual update.
Thanks, a lot, sounds great! I think it would be great for other users too if update schedule will be mentioned in Github README.md or https://ghe.clickhouse.tech/ website.
About usage of the dataset, I am working on relaunch of Open source government observatory project https://data.world/ibegtin/open-source-government-project. It's rating of government openness based on open source activity of government agencies. It uses government.github.com list of government orgs on github with some additions and calculates country and country groups levels statistics on amount of published code, activites, forks, stars, active developers and e.t.c. for each country.
Since orgs manually mapped to countries it has some limitation of not found orgs related to countries, but still it's quite useful.
I will try to use Github Explorer API than and return with feedback and will be happy to cooperate if some calculations could be made directly other GE database.
I've uploaded the latest dump: https://datasets.clickhouse.com/github_events/tsv/github_events_v3.tsv.xz It contains the data up to today - now 5.4 billion events, 200 GB.
Great! Thanks a lot!
@craigbox, Hi!
I decided to rewrite the update script to pure SQL for the sake of testing, and it was paused for a few weeks. See https://github.com/ClickHouse/github-explorer/pull/20
I've deployed the new script, so it should continue being updated.