FreshRSS icon indicating copy to clipboard operation
FreshRSS copied to clipboard

[Feed problem] Titles are not parsed properly.

Open anton-exe opened this issue 2 months ago • 2 comments

Describe the bug

Titles of articles on certain feeds (for me the most notable example being any tumblr blog) do not have HTML escaped characters parsed correctly. Other readers (like RSSGuard's local account) parse this correctly.

To Reproduce

  1. Subscribe to any tumblr blog as a feed (e.g. https://splatoonusna.tumblr.com/rss )
  2. Fetch the latest articles.
  3. Check the titles until you see a non-parsed character like ’
  • If this is an auto generated title (not an ask, no headings used in post, etc.), compare to the first paragraph of the body. This will be parsed correctly.

Expected behavior

The characters are correctly parsed and shown as the actual character.

FreshRSS version

1.27.1

System information

  • Database version: SQLite
  • PHP version:8.2.29
  • Installation type: Docker Compose
  • Web server type: Apache
  • Device: Likely Irrelevant (issue occurs both on Web UI and RSS Guard on a laptop)
  • OS: Arch Linux (kernel 6.17.5)
  • Browser: Floorp 12.3.3 (pulled from FF 144)

Additional context

The difference between the title and body seems to be that all content in the body is wrapped in at least some type of html tag (most commonly <p>), while the title isn't.

anton-exe avatar Oct 27 '25 12:10 anton-exe

It may or may not be wise, but it's parsed properly as the XML that it is. :-) It's a lose-lose situation.

Frenzie avatar Oct 27 '25 12:10 Frenzie

Yes, as @Frenzie said, the feed has a wrong encoding of its titles, and we cannot support it, since it would otherwise break other legitimate use-cases:

https://validator.w3.org/feed/check.cgi?url=https%3A%2F%2Fsplatoonusna.tumblr.com%2Frss

Can you confirm this is a bug for other tumblr.com feeds? (That is pretty bad...)

Has Tumblr maybe another feed format such as Atom?

Alkarex avatar Oct 29 '25 22:10 Alkarex