newsdiffs icon indicating copy to clipboard operation
newsdiffs copied to clipboard

NYT scraper no longer works

Open iamvishnurajan opened this issue 6 years ago • 3 comments

I noticed on/about May 8 2018, the NYT scraper no longer seems to work. I'm not so great with digesting HTML and Python, but it looks like the way NYT articles are encoded and how the different fields are tagged has changed. I will try to play with this and see if I can figure it out, but if anyone has any expertise here, any assistance would be much appreciated.

iamvishnurajan avatar May 28 '18 17:05 iamvishnurajan

Thanks! You should take a look at pull request #49, which takes a stab at it but isn't quite right for all articles.

ecprice avatar May 28 '18 19:05 ecprice

Thank you much! I had not seen that - will check it out now.

iamvishnurajan avatar May 28 '18 19:05 iamvishnurajan

I was able to work off of pull request #49 and create a NYT parser that seems to work for me. The pull request off of #49 is here https://github.com/carlgieringer/newsdiffs/pull/1.

iamvishnurajan avatar May 28 '18 23:05 iamvishnurajan