feeds.fun icon indicating copy to clipboard operation
feeds.fun copied to clipboard

Smart cover image discovering / Images extraction

Open vitonsky opened this issue 7 months ago • 4 comments

Some RSS feeds does not include images. Implement feature to scan url and extract image. Images is important to understand context posts.

vitonsky avatar May 30 '25 12:05 vitonsky

Hi!

Just to clarify, you mean automatically discovering a header/cover image from the original page of the news item? I.e. "smart cover image discovering", not some specific markup that contains a URL to the image in an RSS or ATOM feed.

Tiendil avatar May 30 '25 13:05 Tiendil

@Tiendil you're correct. "Smart cover image discovering" is ideally explains the feature.

Some ideas about heuristic that may be used to find image

  • check meta tags in head. It's weird, but some sites don't place image in RSS entry, but place it for search engines and social media. Maybe because of misconfiguration of RSS generators
  • find block with the same (or most similar) text to body of RSS entry and check near nodes for images
  • search for semantic tag names and structures. Something like first selector that match main article img
  • additional checks for image sizes (to find image with largest size)

vitonsky avatar May 30 '25 13:05 vitonsky

Good idea, worth implementing.

Currently, I cannot provide an estimated timeline, but I plan to prepare a roadmap of significant features for the project, and this one will be added as one of the subfeatures.

Tiendil avatar May 30 '25 13:05 Tiendil

This task is related to gh-357 and gh-351

Tiendil avatar Jun 08 '25 15:06 Tiendil