RSS-Parser icon indicating copy to clipboard operation
RSS-Parser copied to clipboard

Fix setImageFromContent regex to extract full image uri

Open Xiryl opened this issue 9 months ago • 0 comments

Issue

Because setImageFromContent was only matching URLs up to the file extension, it stripped off any query parameters. In practice this means:

  • All Reddit feeds (e.g. https://www.reddit.com/r/minecraft.rss) would fall back to a link without query string.
  • In the official demo app PR—and in any client application using this library—those links no longer resolve correctly, resulting in broken or non-functional article URLs
  • The bug affects any RSS or Atom feed where a valid link must include query parameters

For example, given the URL:

https://www.reddit.com/media?url=https%3A%2F%2Fpreview.redd.it%2Fhg67qgm2ii6f1.png%3Fwidth%3D640%26crop%3Dsmart%26auto%3Dwebp%26s%3D8af0554e2f97a5f9d0d744c5b542071265948a52

the old regex would truncate it to:

https://www.reddit.com/media?url=https%3A%2F%2Fpreview.redd.it%2Fhg67qgm2ii6f1.png

which is not a valid, fully-qualified URL

Fix

  • Added url decoding Before running the URL regex, we now unescape the most common XML/HTML entities (&, &, ", <, >) so that ampersands and other characters are restored to their literal form

  • Extended the URL regex to capture query parameters

Xiryl avatar Jun 19 '25 09:06 Xiryl