acl-anthology icon indicating copy to clipboard operation
acl-anthology copied to clipboard

D11-1120 attachment is a different version of paper

Open jowagner opened this issue 2 years ago • 0 comments
trafficstars

Issue description

Both PDF and Attachment button on https://aclanthology.org/D11-1120/ produce a PDF that looks like the research paper but there are small but important differences, see details below.

Steps to reproduce the issue

  1. Visit https://aclanthology.org/D11-1120/
  2. Download PDF and Attachment
  3. Extract and compare text

What's the expected result?

The attachments provides some substantial additional information, e.g. an appendix, datasets or code.

The paper does not refer to any supplementary material, appendix, code or method for obtaining the data set created for this work, so there is no specific expectation what the correct attachment would contain but I hoped to find a list of 183,729 twitter handles (account names) annotated with binary gender as described in the paper.

What's the actual result?

The attachment appears to be a variant of the PDF, most likely an older version with some conflicting information.

Additional details / screenshot

Differences:

  • proceedings footer and page numbers present vs not present
  • "tweets per user is 22" vs "tweets per user is 21"
  • "is fairly consistent" vs "is consistent"
  • "180,000 vectors" vs "190,000 vectors"
  • "over 3 million training" vs "over 4 million training"
  • "Skene (1979)" vs "Skene ()"

The latter suggest that the attachment version is an earlier draft but it can also be the other way around (accidentally removing the "year" field from the bib entry).

jowagner avatar Apr 03 '23 15:04 jowagner