wikipedia-dump topic
jivesearch
A search engine that doesn't track you.
wikipedia-mirror
🌐 Guide and tools to run a full offline mirror of Wikipedia.org with three different approaches: Nginx caching proxy, Kiwix + ZIM dump, and MediaWiki/XOWA + XML dump
wp2txt
A command-line toolkit to extract text content and category data from Wikipedia dump files
chinese-wikipedia-corpus-creator
Corpus creator for Chinese Wikipedia
OPIEC
Reading the data from OPIEC - an Open Information Extraction corpus
explicit-semantic-analysis
Wikipedia-based Explicit Semantic Analysis, as described by Gabrilovich and Markovitch
wp2git
Downloads and imports Wikipedia page histories to a git repository
wikidump_preprocessing
Extracting useful metadata from Wikipedia dumps in any language.
mediawiki-dump
Python package for working with MediaWiki XML content dumps
IndexWikipedia
A simple utility to index wikipedia dumps using Lucene.