newspaper4k
newspaper4k copied to clipboard
newspaper cannot parse or encode error
Issue by Kyeongpil
Fri Feb 10 11:29:29 2017
Originally opened as https://github.com/codelucas/newspaper/issues/331
Hi, I am trying to crawl Korean news papers.
Most news articles crawled and parsed well but some articles such as below url, newspaper cannot parse well in Korean. http://www.edaily.co.kr/news/newspath.asp?newsid=01610486605953456
After parsing the article, the result is as below the picture.
(Parsed title should be "κΈκ°μ, μλΉμλ¨μ²΄μ κΈμ΅νμ₯μ λ―Όμ μλ΄Β·κ΅¬μ ")

As shown in the a.title, I think newspaper have an encoding problem.
What do you think of this problem?