extraction-framework
extraction-framework copied to clipboard
The software used to extract structured data from Wikipedia
This PR (auto) updates all wikipedia settings from the Wikipedia API and adds newly added wikipedia languages ## Summary by CodeRabbit - New Features - Broadened disambiguation detection and redirect...
Additional templates
this is a temporary pull request in order to check how well older commits from the dev branch can be merged into current master ## Summary by CodeRabbit - New...
This PR refines Amharic mappings and updates local statistics and ignore list as part of GSoC 2025 contributions. ## Summary by CodeRabbit - Chores - Expanded the Amharic statistics ignore...
Changes required for Hindi Chapter.
New datasets for InfoboxReferencesExtractor
New version of the InfoboxReferencesExtractor. Added also integration with CitationExtractor
New datasets for InfoboxReferencesExtractor
Haven't tested this on sample data yet, mustn't merge. @chile12 could you check this?
It seems that many places in South Africa store incorrect coordinates. Take for instance (but it seems to apply to most cities in South Africa): http://dbpedia.org/page/Port_Elizabeth http://dbpedia.org/page/Johannesburg http://dbpedia.org/page/Cape_Town http://dbpedia.org/page/Centurion,_Gauteng On...