FakeNewsCorpus icon indicating copy to clipboard operation
FakeNewsCorpus copied to clipboard

Data Labelling

Open Gautamshahi opened this issue 3 years ago • 3 comments

Hello,

How do you label the article? For example, the data collected from the URL http://beforeitsnews.com/awakening-start-here/2018/01/awakening-of-12-strands-of-dna-reconnecting-with-you-movie-10623.html there is no information that the article is fake.

Please explain if I miss something.

Gautamshahi avatar Apr 21 '21 19:04 Gautamshahi

Hello @Gautamshahi.

The way of labeling articles is described in README file, see part How was the corpus created?.

There, you can find the statement:

Each article has been attributed the same label as the label associated with its domain.

So, article is labeled as fake because beforeitsnews is considered as unreliable news source.

pmacinec avatar Apr 23 '21 06:04 pmacinec

Hello, Yes, but it adds a lot of bias if you consider only domain type for annotation. Did you publish any paper using this approach?

Gautamshahi avatar Apr 23 '21 08:04 Gautamshahi

Yes, you are right. Actually, in research it is not recommended to use datasets labeled by source credibility, however, this annotation technique is used in majority of papers. The reason is simple - it is hard to label articles the other way. There are datasets with manual labeling, however containing only a few samples. The better way, in my opinion, is to label articles by mapping verified claims (e.g. from fact-checking sites) and determine the stance of article to claim.

If you are interested, you can check other datasets in my datasets overview for fake news detection - Fake News Datasets.

pmacinec avatar Apr 23 '21 08:04 pmacinec