common-words
common-words copied to clipboard
Play with GitHub datasets by source{d}?
Hey, great job!
We are source{d}, we are constantly analyzing the world's source code and we've released several datasets on data.world:
- Names in source code extracted from 13 000 000 GitHub repositories. Not people! 30GB.
- October 2016 GitHub repositories not marked as forks but very similar to each other.
- Readme files found in all GitHub repositories (16M, October 2016)
- ≈ 452,000,000 commits' metadata taken from 16,000,000 repositories on GitHub (Oct 2016)
Do you think you can use them to create next cool visuals?
That is awesome! Thanks for sharing!
Have you considered adding your data sets into https://github.com/caesar0301/awesome-public-datasets ?
Thanks for pointing this out, https://github.com/caesar0301/awesome-public-datasets/pull/272