`content_text` incorrectly takes precedence over `content_html` when parsing JSON Feed
content_text incorrectly takes precedence over content_html when parsing JSON Feed, making it impossible to get content_html if both exist.
https://github.com/kurtmckee/feedparser/blob/e43242143ed90ee8cbf05078faf972f8de35a798/feedparser/parsers/json.py#L88-L97
According to https://www.jsonfeed.org/version/1.1/, content_text and content_html are completely equal.
content_htmlandcontent_textare each optional strings — but one or both must be present.
Note that it uses both
content_textandcontent_html, which is completely valid. An app such as iTunes, for instance, might prefer to usecontent_text, while a feed reader might prefercontent_html.
Thus, a better methodology to parse it may be adopting the Atom approach: making entries[i].content a dict array, i.e., [{"type": "text/plain", "value": "content"}, {"type": "text/html", "value": "<p>content</p>"}].
Such a change, admittedly, would break existing downstream projects using the develop branch. Hopefully, this won't be painful, considering JSON Feed support hasn't been released yet.
I am willing to make a PR to achieve this if you think this is feasible.
Thanks for reporting this!
I think I may have caught this while working on a significant expansion of the JSON feed spec. It's in a branch on this repo already, but I haven't worked on that in a while.
I think I can incorporate this issue report in that branch and get that merged, but I don't have a timeline for getting that done.
Good to know that! I looked into https://github.com/kurtmckee/feedparser/tree/expand-json-feed-support and this seems to be a great project! The changes in the mentioned branch indeed fixed the issue. Thanks for your information.