feedparser `content_text` incorrectly takes precedence over `content

content_text incorrectly takes precedence over content_html when parsing JSON Feed, making it impossible to get content_html if both exist.

https://github.com/kurtmckee/feedparser/blob/e43242143ed90ee8cbf05078faf972f8de35a798/feedparser/parsers/json.py#L88-L97

According to https://www.jsonfeed.org/version/1.1/, content_text and content_html are completely equal.

content_html and content_text are each optional strings — but one or both must be present.

Note that it uses both content_text and content_html, which is completely valid. An app such as iTunes, for instance, might prefer to use content_text, while a feed reader might prefer content_html.

Thus, a better methodology to parse it may be adopting the Atom approach: making entries[i].content a dict array, i.e., [{"type": "text/plain", "value": "content"}, {"type": "text/html", "value": "<p>content</p>"}].

Such a change, admittedly, would break existing downstream projects using the develop branch. Hopefully, this won't be painful, considering JSON Feed support hasn't been released yet.

I am willing to make a PR to achieve this if you think this is feasible.

Dec 23 '24 19:12 Rongronggg9

Thanks for reporting this!

I think I may have caught this while working on a significant expansion of the JSON feed spec. It's in a branch on this repo already, but I haven't worked on that in a while.

I think I can incorporate this issue report in that branch and get that merged, but I don't have a timeline for getting that done.

Dec 23 '24 19:12 kurtmckee

Good to know that! I looked into https://github.com/kurtmckee/feedparser/tree/expand-json-feed-support and this seems to be a great project! The changes in the mentioned branch indeed fixed the issue. Thanks for your information.

Dec 23 '24 19:12 Rongronggg9

`content_text` incorrectly takes precedence over `content_html` when parsing JSON Feed