newspaper4k
newspaper4k copied to clipboard
Remove extra sections like 'Also read'
Issue by deepshah
Thu May 26 09:50:39 2016
Originally opened as https://github.com/codelucas/newspaper/issues/257
Some articles (http://indiatoday.intoday.in/story/trupti-desai-kapleshwar-temple-bhumata-brigade-nashik/1/677723.html) have sections like 'Also read' and 'Read more' at the end of the article. How can we remove them
Comment by yprez
Mon May 30 19:01:24 2016
There was some code in cleaners.py that cleans up divs with this sort of thing. But in this case, it looks just like a regular paragraph, without any id or class. I don't know how it could be removed without messing up other cases...