feedparser icon indicating copy to clipboard operation
feedparser copied to clipboard

Feedparser extracts incorrect links

Open dimuska139 opened this issue 7 years ago • 1 comments

When I try to parse this rss, entry.link doesn't match that I expect. I think that the Feedparser incorrectly works with <yandex:related> block.

Here is my code and rss content at the time of writing the issue: code_and_rss.tar.gz

I expect to see (this is valid references to latest posts):

  • http://paleonews.ru/new/1165-300mlnlet
  • http://paleonews.ru/new/1164-mirarce
  • http://paleonews.ru/new/1162-chelonoidis-evol
  • http://paleonews.ru/new/1160-biggest-living-filters
  • http://paleonews.ru/new/1159-razlom
  • http://paleonews.ru/new/1158-eggs
  • http://paleonews.ru/new/1157-blind-vorombe
  • http://paleonews.ru/new/1156-stellerova
  • http://paleonews.ru/new/1155-bug-in-birmit
  • http://paleonews.ru/new/1154-piranhamesodon

Instead I see:

  • https://naked-science.ru/article/sci/v-grand-kanone-nashli-sledy
  • https://nplus1.ru/news/2018/11/13/mirarce-eatoni
  • http://paleonews.ru/new/1162-chelonoidis-evol
  • http://paleonews.ru/index.php
  • https://naked-science.ru/article/sci/paleontologi-obnaruzhili-shest-novyh
  • https://naked-science.ru/article/sci/poyavlenie-okraski-u-ptichih-yaic
  • https://nplus1.ru/news/2018/10/31/blind-Aepyornises
  • https://22century.ru/biology-and-biotechnology/71305
  • https://42.tut.by/613815
  • https://www.sciencemag.org/news/2018/10/piranhalike-teeth-and-torn-fins-reveal-ancient-fish-fight

I use latest version of Feedparser (5.2.1)

dimuska139 avatar Nov 19 '18 18:11 dimuska139

Paleonews.ru doesn't exist anymore but you can find rss-sample in the attached archive

For me this issue is no longer important, I have not developed in Python for a long time. So you can just close it if you do not consider it relevant

dimuska139 avatar Feb 17 '25 12:02 dimuska139