acl-anthology icon indicating copy to clipboard operation
acl-anthology copied to clipboard

Ambiguous semantics of the "url" field

Open mjpost opened this issue 5 years ago • 5 comments

I recall a prior discussion of this issue, but can't find it.

In the XML, <url> is used to denote the presence of a PDF. On the paper page, it is used to (a) generate the canonical URL and (b) generate the link to the PDF. This creates a problem if there is no PDF: there will be no <url> field, and therefore the paper page "URL" field is blank, for example here.

I think we should do the following:

  1. Removing this ambiguity in the XML by converting all <url> instances to <pdf>.
  2. (optional) Repurpose <url> to list the paper's Anthology ID, and make its presence mandatory.

(2) is redundant, but is often useful in code, since a paper's full Anthology ID can be recovered directly, rather than having to be reconstructed from its ID, its (volume) parent ID, and its (collection) grandparent ID.

mjpost avatar Sep 10 '20 15:09 mjpost

(2) is not only redundant, but also a potential source of nasty bugs if, for whatever reason, the ID specified there doesn't match the constructed one. The Anthology library should make getting the full ID easy; if it doesn't make it easy enough, we should improve our library instead.

I'll try to find/recall our previous discussion on (1).

mbollmann avatar Sep 18 '20 22:09 mbollmann

Another idea is we change <url> to <pdf> entirely in the XML, and then just generate url: in the YAML for everything.

mjpost avatar Sep 19 '20 00:09 mjpost

Another idea is we change to entirely in the XML, and then just generate in the YAML for everything.

I've read this several times but I don't get what it means, sorry :)

mbollmann avatar Sep 23 '20 15:09 mbollmann

Sorry, I didn't escape my URL tags, so they were omitted. I just corrected it.

mjpost avatar Sep 23 '20 15:09 mjpost

Ah! Yeah, I'm a bit puzzled (also while investigating #998) why we're even using this field to generate links to the page itself. I think the reason goes back to the possibility of having external URLs (like for LREC) in there, and that's also what our previous discussion was about. What should the URL field of an externally linked paper be? Historically it used to point to the PDF, not the Anthology paper page, and that's why we kept this and why things currently work the way they do. (example)

If we just want the absolute URL for the Anthology-internal page, there should be no need to generate or store this anywhere except in create_hugo_pages.py.

mbollmann avatar Sep 23 '20 15:09 mbollmann