wikipedia-dump topic
List
wikipedia-dump repositories
jivesearch
402
Stars
53
Forks
Watchers
A search engine that doesn't track you.
wikipedia-mirror
331
Stars
29
Forks
Watchers
🌐 Guide and tools to run a full offline mirror of Wikipedia.org with three different approaches: Nginx caching proxy, Kiwix + ZIM dump, and MediaWiki/XOWA + XML dump
wp2txt
169
Stars
39
Forks
Watchers
A command-line toolkit to extract text content and category data from Wikipedia dump files
chinese-wikipedia-corpus-creator
41
Stars
8
Forks
Watchers
Corpus creator for Chinese Wikipedia
OPIEC
36
Stars
6
Forks
Watchers
Reading the data from OPIEC - an Open Information Extraction corpus
explicit-semantic-analysis
34
Stars
9
Forks
Watchers
Wikipedia-based Explicit Semantic Analysis, as described by Gabrilovich and Markovitch
wp2git
33
Stars
1
Forks
Watchers
Downloads and imports Wikipedia page histories to a git repository
wikidump_preprocessing
26
Stars
5
Forks
Watchers
Extracting useful metadata from Wikipedia dumps in any language.
mediawiki-dump
19
Stars
3
Forks
Watchers
Python package for working with MediaWiki XML content dumps
IndexWikipedia
21
Stars
5
Forks
Watchers
A simple utility to index wikipedia dumps using Lucene.