Fix setImageFromContent regex to extract full image uri

Open Xiryl opened this issue 9 months ago • 0 comments

Issue

Because setImageFromContent was only matching URLs up to the file extension, it stripped off any query parameters. In practice this means:

All Reddit feeds (e.g. https://www.reddit.com/r/minecraft.rss) would fall back to a link without query string.
In the official demo app PR—and in any client application using this library—those links no longer resolve correctly, resulting in broken or non-functional article URLs
The bug affects any RSS or Atom feed where a valid link must include query parameters

For example, given the URL:

https://www.reddit.com/media?url=https%3A%2F%2Fpreview.redd.it%2Fhg67qgm2ii6f1.png%3Fwidth%3D640%26crop%3Dsmart%26auto%3Dwebp%26s%3D8af0554e2f97a5f9d0d744c5b542071265948a52

the old regex would truncate it to:

https://www.reddit.com/media?url=https%3A%2F%2Fpreview.redd.it%2Fhg67qgm2ii6f1.png

which is not a valid, fully-qualified URL

Fix

Added url decoding Before running the URL regex, we now unescape the most common XML/HTML entities (&amp;, &, ", <, >) so that ampersands and other characters are restored to their literal form
Extended the URL regex to capture query parameters

Jun 19 '25 09:06 Xiryl