feedpushr icon indicating copy to clipboard operation
feedpushr copied to clipboard

Some sites parsed incorrectly

Open pztrn opened this issue 2 years ago • 3 comments

Hello, I'm in docker on e38a6ee7b2037be20d6e2348dfa57c60551d7c9a with mail output plugin.

Some sites parsed incorrectly, e.g. sometimes new releases from github repositories appears like:

изображение

and no actual release information.

Confirmed feeds:

  • Github releases.
  • https://linux.org.ru site (like https://www.linux.org.ru/section-rss.jsp?section=1 feed).

It happens absolutely randomly, sometimes it parses feed normally, sometimes it puts something like HTML head in letter (like on screenshot).

I was using latest release before, it was working fine.

pztrn avatar Jan 01 '23 09:01 pztrn

Hello, are you using the fetch filter plugin ?

ncarlier avatar Jan 02 '23 13:01 ncarlier

Yes, it is enabled.

pztrn avatar Jan 02 '23 17:01 pztrn

The feed is correctly parsed but the "fetch" filter tries to retrieve the HTML content of the original URL (via Web Scrapping technics). Some websites are not well scraped. It depends mainly of the page structure. I suggest you add a tag only on the feeds you want to be scrapped (ex: tofetch). Then add a condition on the fetch plugin to be activated only on this tag (ex: "tofetch" in Tags).

ncarlier avatar Jan 02 '23 23:01 ncarlier