newspaper icon indicating copy to clipboard operation
newspaper copied to clipboard

Parsing Incorrectly for Yahoo Finance

Open jtara1 opened this issue 3 years ago • 4 comments

for article links like

url action required to view all
https://finance.yahoo.com/m/ca7cfbea-6f81-34b7-a482-1f7723376eff/ford-crushes-q3-views-with.html Read More button
https://finance.yahoo.com/news/epa-data-shows-tesla-excels-200035538.html
https://finance.yahoo.com/news/q3-gdp-gross-domestic-product-usa-coronavirus-pandemic-181533194.html Story Continues button to view the remainder of the text

use case

# url = 'https://finance.yahoo.com/m/ca7cfbea-6f81-34b7-a482-1f7723376eff/ford-crushes-q3-views-with.html'  # needs redirect (clicking on Read More)
url = 'https://finance.yahoo.com/news/epa-data-shows-tesla-excels-200035538.html'

from newspaper import Article

a = Article(url)
a.download()
a.parse()
print(a.text[:300])  # attach a debugger

I'm seeing it grab the article text for a suggested article that's down below on the page for the value of a.text.

jtara1 avatar Oct 29 '20 00:10 jtara1

Did you solve the issue? could you tell me how to handle?

sysmetic avatar Dec 18 '20 08:12 sysmetic

I did solve it partially. I'm checking if the show more button leads to another site and running article parsing on that page. Otherwise, I'm grabbing the text and title and just assign those values to article.title article.text.

I only care about text and title. I'll post implementation in a day or 2. Using pyquery to do html parsing with css selectors.

jtara1 avatar Dec 18 '20 18:12 jtara1

I did solve it partially. I'm checking if the show more button leads to another site and running article parsing on that page. Otherwise, I'm grabbing the text and title and just assign those values to article.title article.text.

I only care about text and title. I'll post implementation in a day or 2. Using pyquery to do html parsing with css selectors.

Hey mate, can you share the implementation for the fix? Thank you!

DrElyt avatar Feb 16 '21 23:02 DrElyt

Any updates?

calpa avatar Mar 01 '21 15:03 calpa