[Feed problem] Titles are not parsed properly.
Describe the bug
Titles of articles on certain feeds (for me the most notable example being any tumblr blog) do not have HTML escaped characters parsed correctly. Other readers (like RSSGuard's local account) parse this correctly.
To Reproduce
- Subscribe to any tumblr blog as a feed (e.g. https://splatoonusna.tumblr.com/rss )
- Fetch the latest articles.
- Check the titles until you see a non-parsed character like ’
- If this is an auto generated title (not an ask, no headings used in post, etc.), compare to the first paragraph of the body. This will be parsed correctly.
Expected behavior
The characters are correctly parsed and shown as the actual character.
FreshRSS version
1.27.1
System information
- Database version: SQLite
- PHP version:8.2.29
- Installation type: Docker Compose
- Web server type: Apache
- Device: Likely Irrelevant (issue occurs both on Web UI and RSS Guard on a laptop)
- OS: Arch Linux (kernel 6.17.5)
- Browser: Floorp 12.3.3 (pulled from FF 144)
Additional context
The difference between the title and body seems to be that all content in the body is wrapped in at least some type of html tag (most commonly <p>), while the title isn't.
It may or may not be wise, but it's parsed properly as the XML that it is. :-) It's a lose-lose situation.
Yes, as @Frenzie said, the feed has a wrong encoding of its titles, and we cannot support it, since it would otherwise break other legitimate use-cases:
https://validator.w3.org/feed/check.cgi?url=https%3A%2F%2Fsplatoonusna.tumblr.com%2Frss
Can you confirm this is a bug for other tumblr.com feeds? (That is pretty bad...)
Has Tumblr maybe another feed format such as Atom?