debate-cards
debate-cards copied to clipboard
Open evidence and Wiki scraper
Downloads round data and open source documents/cites from debate wiki. Also downloads file from openev. Uses xiki's rest api to pull the data. Main limiting factor is the response speed from the server, although should only take a day or two to run. In total there are around 320k rounds across the wikis with roughly half having open source documents + around 10k open ev files.
Todo:
- Implement adding new rounds as they are created, this can be done with roughly one request per new round, so it can just be run once per day or something.
- Add tags to downloaded files.
- Maybe add some sort of parsing of round reports and/or cites. Maybe just extract links from cites and try to split round report by speech.
- Better erorr handling in the parser for weird formats.
Trying to run this, but the application seems to hang (I think while trying to load spaceData
?). Any idea what's up?
Sorry, should have clarified. Loading the list of rounds to download takes a long time (Something like 30 minutes irrc) If you want to load data quicker for testing you can add a .slice(0, 2) or .slice(0, 1) on these two lines so you only load the full data for a few of the wikis. https://github.com/arvind-balaji/debate-cards/blob/e401edee268797b5afb22bcf6b9ff349e9e5eac4/src/lib/debate-tools/wiki.ts#L76-L78 https://github.com/arvind-balaji/debate-cards/blob/e401edee268797b5afb22bcf6b9ff349e9e5eac4/src/lib/debate-tools/wiki.ts#L85 In the future it would probably be a good idea to add some way of configuring which wikis to load
Wiki was just updated and the api overhauled, terms now also ban bulk downloads of data. I have a dump of most of the relevant data though.