acl-anthology
acl-anthology copied to clipboard
Event material should be <attachment>, not <url>
We have recently introduced events, which can contain links to extra materials (such as the handbook) and talks, which can link to videos, for example: https://github.com/acl-org/acl-anthology/blob/c7127b85f4f6979410f05424b0d6a3785ef3af38/data/xml/2022.acl.xml#L12299-L12322
These all use a <url> tag, which got an extra "type" attribute just for this purpose, for example:
<url type="video">2022.acl.keynote1.mp4</url>
I propose to remove this "type" attribute from <url> again and just change this to be an <attachment>. Attachments already support "type" attributes, and what's more, they are actually intended for local filenames, contrary to <url> which up to this change was exclusively for fully-qualified URLs. Additionally, this would mean adding a checksum for these files, which seems like a good idea.
Ah, one slight correction: <attachment> so far is only for local files, and requires a checksum. This is the only tag that behaves this way, so I would propose changing it to work like url-with-checksum does in the schema.
I also cannot find all the referenced files on the server, so I cannot prepare a PR with my suggestion :upside_down_face:
~/Downloads wget "https://aclanthology.org/2022.acl.keynote1.mp4" [14:06:02]
--2023-08-15 14:06:23-- https://aclanthology.org/2022.acl.keynote1.mp4
Resolving aclanthology.org (aclanthology.org)... 174.138.37.75
Connecting to aclanthology.org (aclanthology.org)|174.138.37.75|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2023-08-15 14:06:23 ERROR 404: Not Found.
❌8 ~/Downloads wget "https://aclanthology.org/attachments/2022.acl.keynote1.mp4" [14:06:44]
--2023-08-15 14:06:57-- https://aclanthology.org/attachments/2022.acl.keynote1.mp4
Resolving aclanthology.org (aclanthology.org)... 174.138.37.75
Connecting to aclanthology.org (aclanthology.org)|174.138.37.75|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2023-08-15 14:06:58 ERROR 404: Not Found.
❌8 ~/Downloads wget "https://aclanthology.org/events/acl-2022/2022.acl.keynote1.mp4" [14:06:33]
--2023-08-15 14:06:43-- https://aclanthology.org/events/acl-2022/2022.acl.keynote1.mp4
Resolving aclanthology.org (aclanthology.org)... 174.138.37.75
Connecting to aclanthology.org (aclanthology.org)|174.138.37.75|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2023-08-15 14:06:44 ERROR 404: Not Found.
We also have <video> (e.g., <video href="https://vimeo.com/306168250" tag="video"/>). It would be nice to consolidate all of these, and I agree that <attachment> is the way to do it.
(Related, but likely out-of-scope, is that <url> would be better if it were named <pdf>).
Let me look for the videos.
Okay, I started this but didn't finish it. The redirect is maybe not working, but the files are at anthology-files/videos/acl.
e.g., https://aclanthology.org/anthology-files/videos/acl/2022.acl.keynote1.mp4