wikipedia-corpus topic

List wikipedia-corpus repositories

OPIEC

36
Stars
6
Forks
Watchers

Reading the data from OPIEC - an Open Information Extraction corpus

ML-You-Can-Use

34
Stars
6
Forks
Watchers

Practical ML and NLP with examples.

Wikipedia-Search-Engine

23
Stars
10
Forks
Watchers

Involves building a search engine on the Wikipedia Data Dump using the data dump of 2013 of size 43 GB. The search results returns in real time.

wikipedia2corpus

38
Stars
3
Forks
Watchers

Wikipedia text corpus for self-supervised NLP model training

mediawiki-dump

19
Stars
3
Forks
Watchers

Python package for working with MediaWiki XML content dumps

Wikipedia-Article-Scraper

17
Stars
7
Forks
Watchers

A complete Python text analytics package that allows users to search for a Wikipedia article, scrape it, conduct basic text analytics and integrate it to a data pipeline without writing excessive code...

pyWikiMM

15
Stars
2
Forks
Watchers

Collects a multimodal dataset of Wikipedia articles and their images