Scrape the Gibson

These code snippets are the core of a post I wrote about web scraping in python. It's addressed at people who have already done a bit of coding but want to explore scraping in python in more depth. The workshop will be much easier if you have a Mac or Linux-based computer.

Dependencies

Download repo: https://github.com/abelsonlive/scrape-the-gibson
Install dependencies

If you don't have pip installed, type:

sudo easy_install pip

change directories

cd nyu-skill-share-scraping

now run:

sudo pip install -r requirements.txt

Topics

Introduction

Getting started with Scraping in Python using requests
Exploring HTML documents and extracting the data, with BeautifulSoup
Saving scraped data to a database with dataset

Advanced

Thinking about ETL (Extract, Transform, Load)
Keep your source data around.
Running multiple requests in parallel to scrape faster
- Thready
Regular Expressions to Extract More Data
Programmatic crawling of entire sites.

scrape-the-gibson
scrape-the-gibson copied to clipboard

Metadata

Scrape the Gibson

Dependencies

Topics

Introduction

Advanced

Links

← Metadata

Owner

Metadata

scrape-the-gibson scrape-the-gibson copied to clipboard

Metadata

Scrape the Gibson

Dependencies

Topics

Introduction

Advanced

Links

← Metadata

Owner

Metadata

scrape-the-gibson
scrape-the-gibson copied to clipboard