job_scraper
job_scraper copied to clipboard
A job scraper using the Scrapy framework
Simple Job Scraper
~~Searches Stackoverflow & Dice for jobs and saves the results to DynamoDB.~~
Job site aggregator. Scrapes results from multiple job sites and returns result to web page.
Uses AWS Lambda, Python, Scrapy & Travis CI.
Will eventually use Django for the web app.
To Do (Adapt to new architecture)
- Scrapy saves job items as list of dictionaries (1 per job)
- Convert list of dicts to json object
- Return json of processed jobs from AWS Lambda function, build job elements on page from return json.
- Invoke lambda function directly from static page using AWS Javascript SDK. (Remove DynamoDB)
- Add search box and button to front end to invoke lambda function. (Start with job titles)
- Pass arguments into scrapy to use for searching, pass data from lambda invocation in javascript from static page.
Avoid using API gateway and DynamoDB. Invoke the lambda function directly from the page and then return the results. No need to store long term if it's fast enough!
Resources:
Scrapydo documentation (See the scrapydo.run_spider example)
Pass user defined arguments into scrapy spiders
Save scrape results into list of dicts
Convert list of dicts to json in python
Invoke a Lambda Function from javascript
Building a python AWS Lambda deployment package
Old Resources
Build An API To Expose An AWS Lambda Function
Scrapy Throws ReactorNotRestartable on AWS Lambda