Running in Amazon Lambda
Issue by JamesChevalier
Tue Apr 26 00:42:00 2016
Originally opened as https://github.com/codelucas/newspaper/issues/237
I got this running in an Amazon Lambda function, and I wanted to share how I did it just in case it was useful for others.
This gist covers the details, including the Lambda function itself. The one huge caveat is that this only applies to the Python 2.7 version, because that's what Amazon Lambda provides.
I wasn't sure where to put this, since it isn't really an Issue (in the problematic sense). I also didn't want to be the first person to add to the Wiki, especially with something so specific.
Comment by yprez
Thu May 5 08:28:22 2016
@JamesChevalier thanks, looks useful! Not sure where to put this either...
Comment by bisoldi
Thu May 4 01:10:53 2017
Lambda now supports Python 3.6. Any thoughts on how to get Newspaper 3 deployed to AWS Lambda? I've been trying to figure out how to build Newspaper 3 on an EC2 instance (with the same AMI as Lambda), however the Amazon Linux doesn't come with Python 3.6 and I can't get it installed. Unfortunately, that's the extent of my Python knowledge...
If you have any thoughts / suggestions, I'm happy to continue working on it as I would love to get it running as a standalone service.
Thanks!
Comment by JamesChevalier
Thu May 4 02:02:19 2017
What trouble(s) are you running into when attempting to install Python 3.6 in Amazon Linux?
Another approach that might work is to try doing the build process through LambCI's Lambda Docker image: https://hub.docker.com/r/lambci/lambda/
Comment by bisoldi
Thu May 4 02:18:41 2017
Thanks for responding. Well, the AMI doesn't have Python 3.6 in the yum repository and I haven't found any instructions on how to install it without yum. It has 3.4 and 3.5 but I wasn't sure if building against either would work in a 3.6 runtime.
Comment by vitaly-zdanevich
Mon Jun 12 09:44:15 2017
AWS Lambda can write only to /tmp, so in settings.py we need to change DATA_DIRECTORY from .newspaper_scraper to /tmp/.newspaper_scraper. Also I do not know how to determine from Python that now we run inside AWS Lambda - maybe check for environment variable like AWS_LAMBDA_FUNCTION_NAME?
Comment by bisoldi
Tue Jun 13 01:40:46 2017
I was finally able to deploy newspaper3k to AWS Lambda via Codebuild --> Cloudformation, however I can only get the download() and parse() functions to work. Calling nlp() throws an SQLite error by the NLTK. library I've done some searching and communicated with AWS about this and it appears that SQLite is expected to be embedded within Python and the Python 3.6 runtime on Lambda does not have it. I've tried compiling and building SQLite into my app, but that didn't work. I've filed a request with AWS to both create an AMI with a Python 3.6 environment for Codebuild and to embed SQLite into the Python 3.6 runtime.
Comment by bisoldi
Fri Oct 20 18:45:36 2017
I don't have one, except to say that AWS just recently released an AMI that has Python 3.6 already installed and when I filed the request, they did indicate they already knew about the SQLite issue and were considering adding it in. I haven't checked it though....
Comment by will3216
Fri May 4 01:34:34 2018
For those who want to run newspaper3k on aws lambda I got it working, and published this template to hopefully save people some time! https://github.com/will3216/newspaper3k_lambda_template
The dependencies are pre-built and checked in, works with nltk and whatnot. Instructions for adding additional dependencies are included in the readme, but by default should work out-of-box
Comment by bisoldi
Fri Jun 29 01:33:14 2018
@will3216 Dude....How did you get the NLTK stuff to work?
I spent far more time than I care to admit trying to get sqllite3 to work in Lambda and couldn't get it to work! AWS even confirmed it's a known issue!
Comment by will3216
Fri Jun 29 21:25:41 2018
@bisoldi Ha! Yeah, that was a pain... I manually copied in a file AWS's python build was missing from this project https://github.com/Miserlou/lambda-packages
I just made some changes to the template I posted above which now allows you to modify the dependencies you are using by using docker to spin up an Amazon Linux AMI to dynamically build/package your lambda function along with its dependencies.
Comment by bisoldi
Mon Jul 9 14:22:00 2018
@will3216 I also got it to work by simply dropping the sqllite library in. I then integrated CircleCI and in a different repo implemented the modified newspaper library. If there is any interest, I might open source the modifications.
I assumed that would not be a change acceptable in a PR, unless @codelucas wants it?
Comment by vitaly-zdanevich
Tue Oct 30 19:24:04 2018
This issue is resolved - looks like it enough to have in settings.py:
tempfile.gettempdir()
Against sqlite I have this:
sys.modules['sqlite'] = imp.new_module('sqlite')
sys.modules['sqlite3.dbapi2'] = imp.new_module('sqlite.dbapi2')
See https://stackoverflow.com/a/44532317/1879101
UPD: ok I agree - it is better do not use sqlite when a client code does not use it too - for KISS of user, I hope that it will be implemented too.
Comment by palmerabollo
Fri Dec 27 09:17:26 2019
Does anyone know a Lambda Layer containing all the requirements to run newspaper on AWS Lambda?
Comment by bisoldi
Fri Jan 3 13:35:24 2020
I haven't open sourced it (yet??) but I did get it to work as a layer.
Comment by Aditya94A
Wed Jun 3 17:41:50 2020
Does anyone have a 2020 way of doing this with the latest library version?
Comment by Aditya94A
Wed Jun 3 17:47:05 2020
@bisoldi Please do open source your solution, it would be extremely helpful to everyone 😁
Comment by bisoldi
Wed Jun 3 19:25:48 2020
I think I might....though, I have found a great deal of inaccuracies with respect to extracting the article's publish date. Do you (does anyone) know if there are there any improvements possible in that area?