Malayalam-Newspaper-Article-Dataset
Malayalam-Newspaper-Article-Dataset copied to clipboard
The project scraps articles from a malayalam newspaper website to create a corpus. A set of queries is created and corresponding ground truth answers is retrieved. This can be used as a dataset that c...
Malayalam-Newspaper-Article-Dataset
Project scraped articles from a malayalam newspaper(janmabhumi) website to create a corpus of news articles. Also a set of queries is created and corresponding ground truth answers is retrieved by a combination of bm25 method and tf-idf method. The dataset can be useful for creating tools like stemmer, stopwords removal, lemmatizers, etc...
Dataset includes news articles from the year 2014 to 2018
##Note
This repo is obsolete, and scrapping does not work on the mentioned site.
DATASET
Directly download the complete dataset from Dropbox
Email : [email protected]
Related Works
A similar repo with Telugu DataSet can be found here.