newspaper icon indicating copy to clipboard operation
newspaper copied to clipboard

Not woking on "nytimes.com"

Open Praveena0989 opened this issue 7 years ago • 12 comments

I tried few articles from NYtimes.com but it is able to parse half article and missing first half Example urls: url 1 url2

Praveena0989 avatar May 04 '17 19:05 Praveena0989

Did you check the website to make sure that you haven't reached the max free articles that you are allowed to see for the month?

dlundergreen avatar May 24 '17 15:05 dlundergreen

@dlundergreen I don't remember when was the last time I opened NYTimes before. That means I am sure not crossed the limit.

Praveena0989 avatar May 24 '17 17:05 Praveena0989

This also happens for other links. For example, on this URL only a part of the body is parsed. Is this because the individual <p> elements are in different parent <div>'s?

sskadamb avatar Jul 05 '17 19:07 sskadamb

NYTimes articles are over 2 DIVs and generally the second one is bigger making newspaper picking it.

Cabu avatar Sep 05 '17 09:09 Cabu

anyone was able to solve this ?

ghost avatar Nov 05 '18 10:11 ghost

I found that changing PARENT_DECAY to 1.0 make it for NYT

Cabu avatar Nov 05 '18 20:11 Cabu

@Cabu I couldn't found a variable named PARENT_DECAY on master branch, so where is this located ?

ghost avatar Nov 06 '18 09:11 ghost

@loaighoraba paper = newspaper.build(source_url, PARENT_DECAY=1.0)

Cabu avatar Nov 06 '18 10:11 Cabu

@Cabu seems this is changed in the master branch, there is no such variable.

ghost avatar Nov 06 '18 10:11 ghost

@loaighoraba Ho yes. I see, now it seems to be hardcoded in extractor.py line 825 :/ Having it as a 'hidden' feature was practical for sources like the NYT.

Cabu avatar Nov 06 '18 13:11 Cabu

@Cabu I see, however this won't solve the issue if the common parent is more than two levels up, thanks for this anyway.

ghost avatar Nov 06 '18 14:11 ghost

Not sure if anyone is watching for updates on this issue but my linked PR has been tested with both URLs here. Happy to hear feedback/suggestions on it 👍🏽

jecarr avatar May 10 '21 05:05 jecarr