NewsScraper icon indicating copy to clipboard operation
NewsScraper copied to clipboard

Scraping several sites at the same time

Open racindustries opened this issue 7 years ago • 12 comments

When running the code only the first news website entered in the json list seems to be downloaded and parsed. Do you have any suggestion ?

racindustries avatar Jul 27 '18 06:07 racindustries

@racindustries same with me. And I've looked around to see if anyone has a solution. Haven't found any. Were you able to work around this issue?

ghost avatar Aug 30 '18 05:08 ghost

same i also need help???

Susmithap3 avatar Sep 30 '18 19:09 Susmithap3

Can any of you please share the code that you're using? I used the code of this repo and worked fine with the JSON list I provided to it.

ivanovishado avatar Oct 22 '18 06:10 ivanovishado

It's been a while from my end, but i used the exact same code from holwech only changed the news sites.

On Mon, Oct 22, 2018 at 9:40 AM Iván Galaviz [email protected] wrote:

Can any of you please share the code that you're using? I used the code of this repo and worked fine with the JSON list I provided to it.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/holwech/NewsScraper/issues/1#issuecomment-431751817, or mute the thread https://github.com/notifications/unsubscribe-auth/AlG3evwRtK9o_L2zmixOLc9R8RN5RJZzks5unWhIgaJpZM4Vi9hr .

ghost avatar Oct 22 '18 16:10 ghost

@Civmwa can you please share the JSON list you used to see if I can reproduce the error?

ivanovishado avatar Oct 23 '18 02:10 ivanovishado

@ivanovishado { "The Standard": { "link": "https://www.standardmedia.co.ke/business" }, "bbc": { "rss": "http://feeds.bbci.co.uk/news/rss.xml", "link": "http://www.bbc.com/" }, "theguardian": { "rss": "https://www.theguardian.com/uk/rss", "link": "https://www.theguardian.com/international" }, "breitbart": { "link": "http://www.breitbart.com/" }, "infowars": { "link": "https://www.infowars.com/" }, "foxnews": { "link": "http://www.foxnews.com/" }, "nbcnews": { "link": "http://www.nbcnews.com/" }, "washingtonpost": { "rss": "http://feeds.washingtonpost.com/rss/world", "link": "https://www.washingtonpost.com/" } }

ghost avatar Oct 24 '18 09:10 ghost

@Civmwa NewsScraper.py worked fine for me, here's the output file as proof.

Tested it in Windows 10, Python 3.6.2

ivanovishado avatar Oct 25 '18 04:10 ivanovishado

Hi Ivan - Not entirely sure what happened between when i sent it to you and now, but i ran it and it works. LOL. One small issue though, how would i get to print a summary of the article?

On Thu, Oct 25, 2018 at 7:22 AM Iván Galaviz [email protected] wrote:

@Civmwa https://github.com/Civmwa NewsScraper.py worked fine for me, here's the output file https://pastebin.com/ndLPb7QL as proof.

Tested it in Windows 10, Python 3.6.2

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/holwech/NewsScraper/issues/1#issuecomment-432909856, or mute the thread https://github.com/notifications/unsubscribe-auth/AlG3etznaAb8qfydiRykeT8q-zZc6P27ks5uoTyHgaJpZM4Vi9hr .

ghost avatar Oct 25 '18 05:10 ghost

Not entirely sure what happened between when i sent it to you and now, but i ran it and it works. LOL.

@Civmwa lol

how would i get to print a summary of the article?

You need to add content.nlp() just after content.parse() then you would call content.summary. Keep in mind that nlp() adds some processing time and the summary won't be perfect.

ivanovishado avatar Oct 26 '18 03:10 ivanovishado

Thanks Ivan. Much appreciated

On Fri, Oct 26, 2018 at 6:21 AM Iván Galaviz [email protected] wrote:

Not entirely sure what happened between when i sent it to you and now, but i ran it and it works. LOL.

@Civmwa https://github.com/Civmwa lol

how would i get to print a summary of the article?

You need to add content.nlp() just after content.parse() then you would call content.summary. Keep in mind that nlp() adds some processing time and the summary won't be perfect.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/holwech/NewsScraper/issues/1#issuecomment-433273809, or mute the thread https://github.com/notifications/unsubscribe-auth/AlG3ejf3IiyYn8ghMi3lXYVRf3nHneZFks5uon-ogaJpZM4Vi9hr .

ghost avatar Oct 27 '18 18:10 ghost

@Civmwa You're welcome. I believe this issue can be closed now, @racindustries.

ivanovishado avatar Oct 29 '18 01:10 ivanovishado

Yes.

On Mon, Oct 29, 2018 at 4:46 AM Iván Galaviz [email protected] wrote:

@Civmwa https://github.com/Civmwa You're welcome. I believe this issue can be closed now, @racindustries https://github.com/racindustries.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/holwech/NewsScraper/issues/1#issuecomment-433764553, or mute the thread https://github.com/notifications/unsubscribe-auth/AlG3eutQdj1alQiwLXzLmkJUsIkRHN3nks5upl4NgaJpZM4Vi9hr .

ghost avatar Oct 29 '18 04:10 ghost