GSoC-Data
GSoC-Data copied to clipboard
GSoC Data from 2005 to 2018 in JSON format.
GSoC Data
All the data from GSoC-archive in JSON format.
NOTE For running the scrapers you must install the following dependencies
- asyncio
- aiohttp
You can do that by running: pip install ayncio aiohttp
Directories
-
Data/-
orgs/- all orgs that have been a part of GSoC from 2005 to 2017 -
projects/- all projects that are completed under GSoC program from year 2005-2017
-
-
Scrapers/- Contains all the scrapers used for scraping the data
Data
orgs/
-
2005.json-2008.jsonlink: URL of the orgname: Name of the org
-
2009-2013.jsonabout: Work that org dolink: URL of the orgmail: Mailing list of the orgname: Name of the orgpage: Idea page of the org
-
2014-2015.jsonlink: URL of the orgmail: Mailing list of the orgpage: Idea page of the orgname: Name of the org selected
-
2016-2017.jsonabout: Info about the organizationlink: URL of the orgname: Name of the org
projects/
-
2005.json-2008.jsonMentor: Name of the mentor of the projectproject: Name of the projectstudent: Name of the student
-
2009-2013.json&2014-2015.jsonOrganization: Name of the organizationdetail: Detail about the projectlink: Link to the projectstudent: Name of the student selectedtitle: Name of the project
-
2016-2017.jsonOrganization: Name of the organizationlink: Link to the projectmentors: Name of the mentorsstudent: Name of the studenttitle: Name of the project
What can be done with the data?
This data will be used for improving the functionality of Soccer.
It can also be used to generate various stats, plots or answer data-related questions like:
- Who did the most number of GSoCs? under which org?
- Which org has the highest sutdent-to-mentor conversion rate? (students who first did GSoC under the org, and then became mentors)
- Run some magic on the descriptions of projects over the years to find out if there is a trend of ML related projects.
etc. etc.
Feel free to open issues to discuss any more ideas!