newspaper Not woking on "nytimes.com"

I tried few articles from NYtimes.com but it is able to parse half article and missing first half Example urls: url 1 url2

May 04 '17 19:05 Praveena0989

Did you check the website to make sure that you haven't reached the max free articles that you are allowed to see for the month?

May 24 '17 15:05 dlundergreen

@dlundergreen I don't remember when was the last time I opened NYTimes before. That means I am sure not crossed the limit.

May 24 '17 17:05 Praveena0989

This also happens for other links. For example, on this URL only a part of the body is parsed. Is this because the individual <p> elements are in different parent <div>'s?

Jul 05 '17 19:07 sskadamb

NYTimes articles are over 2 DIVs and generally the second one is bigger making newspaper picking it.

Sep 05 '17 09:09 Cabu

anyone was able to solve this ?

Nov 05 '18 10:11 ghost

I found that changing PARENT_DECAY to 1.0 make it for NYT

Nov 05 '18 20:11 Cabu

@Cabu I couldn't found a variable named PARENT_DECAY on master branch, so where is this located ?

Nov 06 '18 09:11 ghost

@loaighoraba paper = newspaper.build(source_url, PARENT_DECAY=1.0)

Nov 06 '18 10:11 Cabu

@Cabu seems this is changed in the master branch, there is no such variable.

Nov 06 '18 10:11 ghost

@loaighoraba Ho yes. I see, now it seems to be hardcoded in extractor.py line 825 :/ Having it as a 'hidden' feature was practical for sources like the NYT.

Nov 06 '18 13:11 Cabu

@Cabu I see, however this won't solve the issue if the common parent is more than two levels up, thanks for this anyway.

Nov 06 '18 14:11 ghost

Not sure if anyone is watching for updates on this issue but my linked PR has been tested with both URLs here. Happy to hear feedback/suggestions on it 👍🏽

May 10 '21 05:05 jecarr

newspaper newspaper copied to clipboard

Not woking on "nytimes.com"

newspaper
newspaper copied to clipboard