acl-anthology icon indicating copy to clipboard operation
acl-anthology copied to clipboard

Event material should be <attachment>, not <url>

Open mbollmann opened this issue 2 years ago • 5 comments
trafficstars

We have recently introduced events, which can contain links to extra materials (such as the handbook) and talks, which can link to videos, for example: https://github.com/acl-org/acl-anthology/blob/c7127b85f4f6979410f05424b0d6a3785ef3af38/data/xml/2022.acl.xml#L12299-L12322

These all use a <url> tag, which got an extra "type" attribute just for this purpose, for example:

      <url type="video">2022.acl.keynote1.mp4</url>

I propose to remove this "type" attribute from <url> again and just change this to be an <attachment>. Attachments already support "type" attributes, and what's more, they are actually intended for local filenames, contrary to <url> which up to this change was exclusively for fully-qualified URLs. Additionally, this would mean adding a checksum for these files, which seems like a good idea.

mbollmann avatar Aug 15 '23 11:08 mbollmann

Ah, one slight correction: <attachment> so far is only for local files, and requires a checksum. This is the only tag that behaves this way, so I would propose changing it to work like url-with-checksum does in the schema.

mbollmann avatar Aug 15 '23 12:08 mbollmann

I also cannot find all the referenced files on the server, so I cannot prepare a PR with my suggestion :upside_down_face:

~/Downloads  wget "https://aclanthology.org/2022.acl.keynote1.mp4"                                                                                                                                     [14:06:02] 
--2023-08-15 14:06:23--  https://aclanthology.org/2022.acl.keynote1.mp4
Resolving aclanthology.org (aclanthology.org)... 174.138.37.75
Connecting to aclanthology.org (aclanthology.org)|174.138.37.75|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2023-08-15 14:06:23 ERROR 404: Not Found.

                                                                                                                                                                                                                   
❌8 ~/Downloads  wget "https://aclanthology.org/attachments/2022.acl.keynote1.mp4"                                                                                                                     [14:06:44] 
--2023-08-15 14:06:57--  https://aclanthology.org/attachments/2022.acl.keynote1.mp4
Resolving aclanthology.org (aclanthology.org)... 174.138.37.75
Connecting to aclanthology.org (aclanthology.org)|174.138.37.75|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2023-08-15 14:06:58 ERROR 404: Not Found.


❌8 ~/Downloads  wget "https://aclanthology.org/events/acl-2022/2022.acl.keynote1.mp4"                                                                                                                 [14:06:33] 
--2023-08-15 14:06:43--  https://aclanthology.org/events/acl-2022/2022.acl.keynote1.mp4
Resolving aclanthology.org (aclanthology.org)... 174.138.37.75
Connecting to aclanthology.org (aclanthology.org)|174.138.37.75|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2023-08-15 14:06:44 ERROR 404: Not Found.

mbollmann avatar Aug 15 '23 12:08 mbollmann

We also have <video> (e.g., <video href="https://vimeo.com/306168250" tag="video"/>). It would be nice to consolidate all of these, and I agree that <attachment> is the way to do it.

(Related, but likely out-of-scope, is that <url> would be better if it were named <pdf>).

Let me look for the videos.

mjpost avatar Aug 15 '23 12:08 mjpost

Okay, I started this but didn't finish it. The redirect is maybe not working, but the files are at anthology-files/videos/acl.

mjpost avatar Aug 15 '23 12:08 mjpost

e.g., https://aclanthology.org/anthology-files/videos/acl/2022.acl.keynote1.mp4

mjpost avatar Aug 15 '23 12:08 mjpost