wikipedia-dump topic

List wikipedia-dump repositories

jivesearch

402
Stars
53
Forks
Watchers

A search engine that doesn't track you.

wikipedia-mirror

331
Stars
29
Forks
Watchers

🌐 Guide and tools to run a full offline mirror of Wikipedia.org with three different approaches: Nginx caching proxy, Kiwix + ZIM dump, and MediaWiki/XOWA + XML dump

wp2txt

169
Stars
39
Forks
Watchers

A command-line toolkit to extract text content and category data from Wikipedia dump files

OPIEC

36
Stars
6
Forks
Watchers

Reading the data from OPIEC - an Open Information Extraction corpus

explicit-semantic-analysis

34
Stars
9
Forks
Watchers

Wikipedia-based Explicit Semantic Analysis, as described by Gabrilovich and Markovitch

wp2git

33
Stars
1
Forks
Watchers

Downloads and imports Wikipedia page histories to a git repository

wikidump_preprocessing

26
Stars
5
Forks
Watchers

Extracting useful metadata from Wikipedia dumps in any language.

mediawiki-dump

19
Stars
3
Forks
Watchers

Python package for working with MediaWiki XML content dumps

IndexWikipedia

21
Stars
5
Forks
Watchers

A simple utility to index wikipedia dumps using Lucene.