FakeNewsCorpus
FakeNewsCorpus copied to clipboard
Data Labelling
Hello,
How do you label the article? For example, the data collected from the URL http://beforeitsnews.com/awakening-start-here/2018/01/awakening-of-12-strands-of-dna-reconnecting-with-you-movie-10623.html there is no information that the article is fake.
Please explain if I miss something.
Hello @Gautamshahi.
The way of labeling articles is described in README file, see part How was the corpus created?.
There, you can find the statement:
Each article has been attributed the same label as the label associated with its domain.
So, article is labeled as fake because beforeitsnews is considered as unreliable news source.
Hello, Yes, but it adds a lot of bias if you consider only domain type for annotation. Did you publish any paper using this approach?
Yes, you are right. Actually, in research it is not recommended to use datasets labeled by source credibility, however, this annotation technique is used in majority of papers. The reason is simple - it is hard to label articles the other way. There are datasets with manual labeling, however containing only a few samples. The better way, in my opinion, is to label articles by mapping verified claims (e.g. from fact-checking sites) and determine the stance of article to claim.
If you are interested, you can check other datasets in my datasets overview for fake news detection - Fake News Datasets.