acl-anthology
acl-anthology copied to clipboard
Guidance for program/workshop chairs regarding paper metadata?
I am a workshop organizer and find that the START final submission form needs a nontrivial amount of customization.
Given that author name cleanliness has been an issue, I am including
- an editable Authors field (I don't know if this is present in the final submission form by default)
- below that, a mandatory checkbox: "I have checked that the order/spelling/capitalization of author names is correct and matches the PDF."
Should the Anthology provide official guidelines as to what should be on the final submission form, in order to improve quality of the metadata to be ingested? Maybe there are other recommendations that would make sense as well.
Digging around in START a bit more I see there is a tool called "Title Case Formatter for Titles/Authors" which suggests capitalization fixes using heuristics. I suggest we recommend this IN ADDITION to recommending that authors double-check. And care should be taken that changes made in the formatter tool are also reflected in the PDF.
I don't have this fresh in memory, so screen shots would help. But a note that name formatting is drawn from the global profile could be helpful.
It would be a good service to chairs to have this documented. I would put this in https://github.com/acl-org/acl-pub (perhaps after the consolidation).
As for title case protection, I wonder if we can just handle that on the Anthology side. We got a few people at ACL 2020 who {D}id {S}omething {L}ike {T}his to their titles. Perhaps documentation closer to the actual edit field would stave that off.
name formatting is drawn from the global profile
Only for registered authors. At SemEval we are finding a lot of the papers have unregistered coauthors.
I think it would be nice if START gave a warning to authors if any name is either all upper case or all lower case. There were a huge number of these in ACL 2020. Both Chinese names that were all lower case and Europeans using full caps in the family name. It would be nice to catch these before they even get to the program chair, much less the anthology.
Agreed. But I wonder if we could just handle this ourselves? That is
- Truecase if an existing variant exists
- Otherwise, just capitalize (e.g.,
{MATT,matt,mAtT} → Matt)
The argument being that:
- It's a pain to get anyone else to do things, much less to do them right
- We'll still have this problem for people who choose to ignore START or who use another management system
- This problem seems solvable at least to 99%
I worry about the Anthology recasing in ways that are not transparent to the user. So better if we can implement it in START as well. For SemEval we had a checkbox in the final submission form requiring authors to double-check name spelling and capitalization; maybe START could trigger such a confirmation when entering a name in the author field if and only if the name doesn't conform to a regular expression.
(And if we're asking START to add functionality, could we ask for ORCID fields as well?)