newspaper icon indicating copy to clipboard operation
newspaper copied to clipboard

Does not fetch arabic news

Open moh55m55 opened this issue 3 years ago • 6 comments

Hello, I tried it but it did not fetch Arabic news such as https://www.alarabiya.net/ I got zero article.

My code:

news_paper = newspaper3k.build('https://www.alarabiya.net/', language='ar', memoize_articles=False) 

moh55m55 avatar Jan 16 '21 12:01 moh55m55

Newspaper will obtain article information from the target website, but it requires additional code to bypass the "accept all cookies" prompt which has to be clicked. Take a look at the examples on my newspaper3 usage overview document.

johnbumgarner avatar Jan 16 '21 23:01 johnbumgarner

I reviewed the examples but did not figure out how to bypass the cookies. I appreciate your help

ghost avatar Jan 17 '21 18:01 ghost

The overview talks about using selenium to bypass the "accept all cookies" prompt on website that require you to click them before accessing content. I will look into writing an example for https://www.alarabiya.net, but it will take a couple of days, before I can get to it and update the overview document.

johnbumgarner avatar Jan 17 '21 20:01 johnbumgarner

Sounds great. I appreciate it.

ghost avatar Jan 17 '21 20:01 ghost

I added a scraping example in my Newspaper overview document for the Al Arabiya website. Please note that I didn't build an entire solution for you. All the info to finish the code is in my overview document, which you can add to the other code yourself. Additionally, you will need to determine what urls are important to you, because I don't read Arabic, so it's hard for me to pick the correct items. Good luck.

P.S. Don't forget to close this issue, because it has been solved.

johnbumgarner avatar Jan 21 '21 22:01 johnbumgarner

Sounds great. I appreciate it.

@moh55m55 have you tested my code that I posted on 01-21-2021.

johnbumgarner avatar Apr 15 '21 20:04 johnbumgarner