wikipedia-corpus topic
chinese-wikipedia-corpus-creator
Corpus creator for Chinese Wikipedia
OPIEC
Reading the data from OPIEC - an Open Information Extraction corpus
ML-You-Can-Use
Practical ML and NLP with examples.
Wikipedia-Search-Engine
Involves building a search engine on the Wikipedia Data Dump using the data dump of 2013 of size 43 GB. The search results returns in real time.
wikipedia2corpus
Wikipedia text corpus for self-supervised NLP model training
mediawiki-dump
Python package for working with MediaWiki XML content dumps
Wikipedia-Article-Scraper
A complete Python text analytics package that allows users to search for a Wikipedia article, scrape it, conduct basic text analytics and integrate it to a data pipeline without writing excessive code...
pyWikiMM
Collects a multimodal dataset of Wikipedia articles and their images