acl-anthology
acl-anthology copied to clipboard
Faulty RSS feed (no author, no proceedings)
Confirm that this is a bug report
- [x] I want to report an issue that does not concern paper or author metadata.
- [x] I have searched for similar existing issues first.
Problem Description
The RSS feed (https://aclanthology.org/papers/index.xml) is improperly formatted. It does not have author or proceedings fields, and puts this info in a field called "description". E.g.:
<item>
<title>“Say What?” Influence of Perceived Self-Confidence in English of Senior High School Students on their Willingness to Communicate in English</title>
<link>https://aclanthology.org/2024.paclic-1.113/</link>
<pubDate>Tue, 27 May 2025 00:00:00 +0000</pubDate>
<guid>2024.paclic-1.113</guid>
<description>Mark Joseph B. Zapanta in Proceedings of the 38th Pacific Asia Conference on Language, Information and Computation</description>
I apologize for lying; this does in fact concern metadata but I saw no other way to report it.
@CSchoel Can you comment? Should this be in a different field or was there a specific reason for doing it this way?
I apologize for lying; this does in fact concern metadata but I saw no other way to report it.
No worries, this does not concern metadata as we interpret it. :)
Hi @mbollmann. 👋 I've stayed pretty close to the hugo standard RSS template when I created the feed.
There is an author element in the RSS standard, which we currently don't use. However, according to the specification, that author field is supposed to contain an email address of the article author. I don't think the RSS standard itself has anything specific enough to capture publication metadata.
@jonmay, how would you expect the feed to look like? Can you give a corrected version of the example you posted? I suspect this would work through some metadata standard like DublinCore or PRISM, then?
The specific use case I have is getting the feed in zotero. I am not an expert in this area at all. But this works:
https://rss.arxiv.org/rss/cs
I see e.g.
[image: image.png]
Whereas for ACL I see
[image: image.png]
Unpacking a little, here's an entry from arxiv:
Here's an item from acl:
So here's my attempt to hack the acl entry into arxiv format:
<pubDate>Tue, 27 May 2025 00:00:00 +0000</pubDate>
<guid isPermaLink="false">2024.paclic-1.113 <-- no idea about isPermaLink
There's a "publication" field zotero shows that neither feed populates but i imagine we could put in the publication/journal name there. I have no idea if these fields come from some standard or not...like i said, not very well versed in this area, just back-engineering. Also I have no clue how well this will propagate to other readers but "creator" seems to be the key for author specifically. Hope this is sort of helpful.
jon
On Mon, Jun 9, 2025 at 5:52 AM Christopher Schölzel < @.***> wrote:
CSchoel left a comment (acl-org/acl-anthology#5354) https://github.com/acl-org/acl-anthology/issues/5354#issuecomment-2955693268
Hi @mbollmann https://github.com/mbollmann. 👋 I've stayed pretty close to the hugo standard RSS template https://github.com/gohugoio/hugo/blob/master/tpl/tplimpl/embedded/templates/rss.xml when I created the feed.
There is an author element in the RSS standard https://www.rssboard.org/rss-specification#ltauthorgtSubelementOfLtitemgt, which we currently don't use. However, according to the specification, that author field is supposed to contain an email address of the article author. I don't think the RSS standard itself has anything specific enough to capture publication metadata.
@jonmay https://github.com/jonmay, how would you expect the feed to look like? Can you give a corrected version of the example you posted? I suspect this would work through some metadata standard like DublinCore https://www.dublincore.org/specifications/dublin-core/ or PRISM https://www.w3.org/submissions/prism/, then?
— Reply to this email directly, view it on GitHub https://github.com/acl-org/acl-anthology/issues/5354#issuecomment-2955693268, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFIMNKXUGJENZF7VP26EOL3CV64RAVCNFSM6AAAAAB6ZHAZC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDSNJVGY4TGMRWHA . You are receiving this because you were mentioned.Message ID: @.***>
-- "Je n’ai fait celle-ci plus longue que parce que je n’ai pas eu le loisir de la faire plus courte." -- Pascal
oh i guess "dc" means dublincore. So I guess follow dublincore, since arxiv is de facto a leader here?
On Mon, Jun 9, 2025 at 10:19 AM Jon May @.***> wrote:
The specific use case I have is getting the feed in zotero. I am not an expert in this area at all. But this works:
https://rss.arxiv.org/rss/cs
I see e.g.
[image: image.png]
Whereas for ACL I see
[image: image.png]
Unpacking a little, here's an entry from arxiv:
Infinite Time Turing Machines and their Applications https://arxiv.org/abs/2506.05351arXiv:2506.05351v1 Announce Type: new Abstract: This work establishes a rigorous theoretical foundation for analyzing deep learning systems by leveraging Infinite Time Turing Machines (ITTMs), which extend classical computation into transfinite ordinal steps. Using ITTMs, we reinterpret modern architectures like Transformers, revealing fundamental limitations in scalability, efficiency, and interpretability. Building on these insights, we propose the Universal State Machine (USM), a novel computational paradigm designed from first principles. The USM employs a dynamic, queryable computation graph that evolves in real time, enabling modular, interpretable, and resource-efficient computation. This framework not only overcomes the inefficiencies and rigidity of current models but also lays the groundwork for scalable, generalizable artificial intelligence systems. oai:arXiv.org:2506.05351v1 cs.CC cs.AI cs.FL cs.LG Mon, 09 Jun 2025 00:00:00 -0400 new http://arxiv.org/licenses/nonexclusive-distrib/1.0/ Rukmal Weerawarana, Maxwell Braun Here's an item from acl:
“Say What?” Influence of Perceived Self-Confidence in English of Senior High School Students on their Willingness to Communicate in English https://aclanthology.org/2024.paclic-1.113/Tue, 27 May 2025 00:00:00 +0000 2024.paclic-1.113 Mark Joseph B. Zapanta in Proceedings of the 38th Pacific Asia Conference on Language, Information and Computation So here's my attempt to hack the acl entry into arxiv format:
“Say What?” Influence of Perceived Self-Confidence in English of Senior High School Students on their Willingness to Communicate in English https://aclanthology.org/2024.paclic-1.113<pubDate>Tue, 27 May 2025 00:00:00 +0000</pubDate>
<guid isPermaLink="false">2024.paclic-1.113 <-- no idea about isPermaLink
[put the abstract in here] dc:creatorMark Joseph B. Zapanta</dc:creator>There's a "publication" field zotero shows that neither feed populates but i imagine we could put in the publication/journal name there. I have no idea if these fields come from some standard or not...like i said, not very well versed in this area, just back-engineering. Also I have no clue how well this will propagate to other readers but "creator" seems to be the key for author specifically. Hope this is sort of helpful.
jon
On Mon, Jun 9, 2025 at 5:52 AM Christopher Schölzel < @.***> wrote:
CSchoel left a comment (acl-org/acl-anthology#5354) https://github.com/acl-org/acl-anthology/issues/5354#issuecomment-2955693268
Hi @mbollmann https://github.com/mbollmann. 👋 I've stayed pretty close to the hugo standard RSS template https://github.com/gohugoio/hugo/blob/master/tpl/tplimpl/embedded/templates/rss.xml when I created the feed.
There is an author element in the RSS standard https://www.rssboard.org/rss-specification#ltauthorgtSubelementOfLtitemgt, which we currently don't use. However, according to the specification, that author field is supposed to contain an email address of the article author. I don't think the RSS standard itself has anything specific enough to capture publication metadata.
@jonmay https://github.com/jonmay, how would you expect the feed to look like? Can you give a corrected version of the example you posted? I suspect this would work through some metadata standard like DublinCore https://www.dublincore.org/specifications/dublin-core/ or PRISM https://www.w3.org/submissions/prism/, then?
— Reply to this email directly, view it on GitHub https://github.com/acl-org/acl-anthology/issues/5354#issuecomment-2955693268, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFIMNKXUGJENZF7VP26EOL3CV64RAVCNFSM6AAAAAB6ZHAZC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDSNJVGY4TGMRWHA . You are receiving this because you were mentioned.Message ID: @.***>
-- "Je n’ai fait celle-ci plus longue que parce que je n’ai pas eu le loisir de la faire plus courte." -- Pascal
-- "Je n’ai fait celle-ci plus longue que parce que je n’ai pas eu le loisir de la faire plus courte." -- Pascal
@jonmay As someone who's never used the RSS feed myself, that looks super helpful, thanks! The images (presumably) showing how it looks in Zotero didn't get preserved though, I think you need to add those by replying on the Github website directly.
zotero looking at arxiv:
zotero looking at acl:
DublinCore should be easy to add, but that only adds the creator field. 🤔 I've had a brief look at PRISM, and it also allows for things like "volume" and "number", but I couldn't find a good way to add the proceedings title.
@mbollmann, what do you think? Just add DublinCode and be done with it, or is it worth looking deeper into what other metadata formats might be supported by Zotero?
Small disclaimer: I can't promise when I will have time for this, just that I could do this eventually at some point. DublinCore would be super easy, the main challenge for me would be to remember how to set up the ACL anthology repo and run tests. 😆
@mbollmann, what do you think? Just add DublinCode and be done with it, or is it worth looking deeper into what other metadata formats might be supported by Zotero?
Looks like the "dc:creator" field would be the most useful for Zotero purposes. And I agree that the "description" field would be great for the abstract. I'm not sure if we were concerned about the size of the feed, but if arXiv does it...
In principle, *ACL papers could also link to CC BY-4.0 under "dc:rights", but that would need careful checking that this is only added for *ACL papers.
the main challenge for me would be to remember how to set up the ACL anthology repo and run tests. 😆
Nah, that's the easiest part! Just running make basically does everything for you, provided you have Hugo installed. :)