acl-anthology
acl-anthology copied to clipboard
Add section for announcements and RSS feeds
Closes https://github.com/acl-org/acl-anthology/issues/358. Closes https://github.com/CSchoel/acl-anthology/issues/2. Closes https://github.com/CSchoel/acl-anthology/issues/3.
This alters the existing RSS feeds that were generated by Hugo to be sorted by date and limited to 1000 items. The RSS feeds are mentioned in hugo/layouts/index.html, hugo/layouts/_default/index_flex.html, and hugo/content/news/announce_news.md. For the index.html I used a yellow button for the RSS feed (as the RSS logo is orange) and switched the feedback button to green. I also added the RSS feeds as a <link rel="alternate" ... to the header of each page.
For paper release dates, I used the ingest-date from the XML data, if available. For unknown ingest dates, I then tried to parse the date from the year and month fields of papers. The Python code runs without errors for all existing papers, but could break in the future due to unknown formats for the month field. If this happens, you can easily disable this by removing the --replace-date option in the Makefile where python3 bin/create_hugo_pages.py is called. I did not add paper abstracts to the RSS feed since that would increase the file size, which is already quite large.
Additionally, I added a folder hugo/content/news with a first news post and also mentioned that new section in hugo/layouts/index.html, hugo/layouts/_default/index_flex.html, and hugo/content/news/announce_news.md. Hugo automatically creates an RSS feed for this new section, so there was not more to do there. As mentioned in https://github.com/CSchoel/acl-anthology/issues/3, please feel free to use or discard the test post however you like.
There is only one open question left: I found the posts section, which seems to do almost the same as the new news section. Was it a conscious choice to abandon the old section? If not, we could use that one instead of news.
Please let me know if I should change anything else.
I checked the failing test: It seems that ruff didn't like the RSS template. I wonder if this is because it doesn't understand that this is a template and not an actual XML file? It might be a while before I have the time to investigate this further myself.
That's not ruff, it's a pre-commit hook. We might have to define an exception for the RSS template file.
I hope I can look at this PR sometime soon, but I'm still on vacation so it might be a week or two :)
Yeah, I noticed that it's a call where multiple things are happening at once.
I could reproduce the error locally. I hope I can find some time to look into this, but my calendar is pretty full for September. :see_no_evil:
For the failing "end-of-file-fixer" check, make sure to end the template file with a newline (or just run make check, it will change this automatically).
For the failing "check-xml" check, change the exclude pattern in .pre-commit-config.yaml to:
exclude: |
(?x)^(
hugo/layouts/sitemap.xml|
hugo/layouts/_default/rss.xml
)$
(Currently it just excludes the sitemap.xml, with this change, it excludes both sitemap.xml and the RSS template.)
The --replace-date option is a nice idea, but I'm probably not in favor of it; it adds a lot of complexity to the build scripts, and I don't think it'll actually be needed, because this only concerns older ingestions — new ingestions should always have an explicit ingestion date set, and new ingestions are what the RSS feed is for.
If you can fix the failing checks, I would then suggest we push your changes to a branch within this repo, so that an automatic preview is generated where we can look at the changes directly. :)
Hey there. :wave: It took me a while to find time for this again. :see_no_evil: Thanks for the detailed instructions on how to fix the checks. :+1:
I applied the changes you suggested, and the checks work locally for me now when I run make check. :white_check_mark:
The --replace-date option is a nice idea, but I'm probably not in favor of it; it adds a lot of complexity to the build scripts, and I don't think it'll actually be needed, because this only concerns older ingestions — new ingestions should always have an explicit ingestion date set, and new ingestions are what the RSS feed is for.
Good point. I mainly kept it because I thought there might not be a guarantee that the ingestion date is always provided for future ingestions. I disabled the flag in the Makefile, but I think its probably best if I also delete the related code, right? Less dead stuff to maintain.
@mbollmann Hi. I just wanted to let you know that I have a lot of spare time for private projects this week, so this would be the ideal time for me to wrap this up.
Please let me know if I can help with anything regarding the preview that you mentioned. :smiley:
Preview here: https://preview.aclanthology.org/cschoel_rss_and_blog/
Looks like the link on the front page is not built correctly, but the index appears to be here: https://preview.aclanthology.org/cschoel_rss_and_blog/papers/index.xml
I'll try to take a closer look later today or tomorrow. (Sorry, quite busy time for me...)
I wonder what the right balance is between providing information in the RSS feed vs. not making it too large. Is there some kind of "previewer" that allows you to input an RSS XML file and get back a preview of what it might look like in a feed reader?
As mentioned in another comment, I found this RSS viewer. It wants to load too many ad-related scripts for my taste, but with a script blocker, it does the job.
I would also like to test the new changes with an actual RSS reader such as Feedly. The last preview renders like this:
The "news" section should probably get an entry in the menu bar at the top of the page, too.
Good point. I just added it. :+1:
I'm getting an XML validation error now on the preview due to an unescaped & character in the <description> field. Maybe you need to apply https://gohugo.io/functions/transform/htmlescape/
Damn, I totally misinterpreted what safeHTML was supposed to do. I thought it meant "make this safe for printing to HTML". Instead, it means "this is safe for printing to HTML, don't do anything with it". :see_no_evil:
After removing the call, the & is properly escaped.
Damn, I totally misinterpreted what
safeHTMLwas supposed to do. I thought it meant "make this safe for printing to HTML". Instead, it means "this is safe for printing to HTML, don't do anything with it". 🙈
Ah, that's what happened! :grin: If I had more time to look at things right now, I might have noticed. ;)
Force-pushed updates to #2859.
I looked this over and think it's great.
There was one question about using /posts instead of /news. I'd love to do this, actually. We could recreate all three of these files as blog posts. I am not sure what the provenance is of the two reflections, but it would be nice to seed this new feature with those articles. It'd still be nice to keep the "News" link from the front page, though.
What do you think? Would this be too much work to make these adjustments?
I'm very sorry that I let this drop—thanks for this work, and the recent prod. I will make it a top priority to get this merged ASAP.
Done. :smile: I kept the word "news" instead of "posts" in all the visible texts on the page and used the /posts directory as the link targets. I noticed that the existing posts didn't have a description, so I changed the implementation to prioritize .Description if it is available, and otherwise use the automatically generated .Summary. I also used the same logic in the RSS news feed for consistency.
I'm very sorry that I let this drop—thanks for this work, and the https://github.com/acl-org/acl-anthology/issues/358#issuecomment-2094860295. I will make it a top priority to get this merged ASAP.
No problem. I also forgot about this for a while since I had a lot on my plate. Happy to see that there is still interest in this. :smile:
@mbollmann do you have time to approve? I'd like to get two sign-offs here. If you don't think you can manage this week, I may just go ahead.
Thanks for the reviews! Happy to see this feature make it into the website. I just added the RSS feed to my Feedly account. :hugs: :partying_face: