newspaper
newspaper copied to clipboard
Parsing Incorrectly for Yahoo Finance
for article links like
url | action required to view all |
---|---|
https://finance.yahoo.com/m/ca7cfbea-6f81-34b7-a482-1f7723376eff/ford-crushes-q3-views-with.html | Read More button |
https://finance.yahoo.com/news/epa-data-shows-tesla-excels-200035538.html | |
https://finance.yahoo.com/news/q3-gdp-gross-domestic-product-usa-coronavirus-pandemic-181533194.html | Story Continues button to view the remainder of the text |
use case
# url = 'https://finance.yahoo.com/m/ca7cfbea-6f81-34b7-a482-1f7723376eff/ford-crushes-q3-views-with.html' # needs redirect (clicking on Read More)
url = 'https://finance.yahoo.com/news/epa-data-shows-tesla-excels-200035538.html'
from newspaper import Article
a = Article(url)
a.download()
a.parse()
print(a.text[:300]) # attach a debugger
I'm seeing it grab the article text for a suggested article that's down below on the page for the value of a.text
.
Did you solve the issue? could you tell me how to handle?
I did solve it partially. I'm checking if the show more button leads to another site and running article parsing on that page. Otherwise, I'm grabbing the text and title and just assign those values to article.title
article.text
.
I only care about text and title. I'll post implementation in a day or 2. Using pyquery to do html parsing with css selectors.
I did solve it partially. I'm checking if the show more button leads to another site and running article parsing on that page. Otherwise, I'm grabbing the text and title and just assign those values to
article.title
article.text
.I only care about text and title. I'll post implementation in a day or 2. Using pyquery to do html parsing with css selectors.
Hey mate, can you share the implementation for the fix? Thank you!
Any updates?