newspaper Parsing Incorrectly for Yahoo Finance

Parsing Incorrectly for Yahoo Finance

Open jtara1 opened this issue 3 years ago • 4 comments

for article links like

url	action required to view all
https://finance.yahoo.com/m/ca7cfbea-6f81-34b7-a482-1f7723376eff/ford-crushes-q3-views-with.html	Read More button
https://finance.yahoo.com/news/epa-data-shows-tesla-excels-200035538.html
https://finance.yahoo.com/news/q3-gdp-gross-domestic-product-usa-coronavirus-pandemic-181533194.html	Story Continues button to view the remainder of the text

use case

# url = 'https://finance.yahoo.com/m/ca7cfbea-6f81-34b7-a482-1f7723376eff/ford-crushes-q3-views-with.html'  # needs redirect (clicking on Read More)
url = 'https://finance.yahoo.com/news/epa-data-shows-tesla-excels-200035538.html'

from newspaper import Article

a = Article(url)
a.download()
a.parse()
print(a.text[:300])  # attach a debugger

I'm seeing it grab the article text for a suggested article that's down below on the page for the value of a.text.

Oct 29 '20 00:10 jtara1

Did you solve the issue? could you tell me how to handle?

Dec 18 '20 08:12 sysmetic

I did solve it partially. I'm checking if the show more button leads to another site and running article parsing on that page. Otherwise, I'm grabbing the text and title and just assign those values to article.title article.text.

I only care about text and title. I'll post implementation in a day or 2. Using pyquery to do html parsing with css selectors.

Dec 18 '20 18:12 jtara1

I did solve it partially. I'm checking if the show more button leads to another site and running article parsing on that page. Otherwise, I'm grabbing the text and title and just assign those values to article.title article.text.

I only care about text and title. I'll post implementation in a day or 2. Using pyquery to do html parsing with css selectors.

Hey mate, can you share the implementation for the fix? Thank you!

Feb 16 '21 23:02 DrElyt

Any updates?

Mar 01 '21 15:03 calpa

newspaper newspaper copied to clipboard

Parsing Incorrectly for Yahoo Finance

newspaper
newspaper copied to clipboard