josephmachado/socialetl: Project for "Data pipeline design patterns" blog.

Project design

flowchart LR
    A[API] -->|Extract| B[Transform ]
    B -->|Load| C[Database]

We pull data from Reddit/Twitter API, transform them using python and load them into a database.

Prerequisites

Python3
sqlite3 (comes preinstalled on most os)
Reddit app. You'll need your reddit apps REDDIT_CLIENT_ID, REDDIT_CLIENT_SECRET, & REDDIT_USER_AGENT.
Twitter API token, you'll need your twitter APIs BEARER_TOKEN.
git

git clone https://github.com/josephmachado/socialetl.git
cd socialetl

Setup

Create a .env in the project's root directory, with the following content

REDDIT_CLIENT_ID=replace-with-your-reddit-client-id
REDDIT_CLIENT_SECRET=replace-with-your-reddit-client-secret
REDDIT_USER_AGENT=replace-with-your-reddit-user-agent
BEARER_TOKEN=replace-with-your-twitter-bearer-token

Run the following commands are to be run via the terminal, from your project root directory.

python3 -m venv venv # Create a venv
. venv/bin/activate # activate venv
pip install -r requirements.txt # install requirements
make ci # Run tests, check linting, & format code
make reset-db # Creates DB schemas
make reddit-etl # ETL reddit data
make twitter-elt # ETL twitter data
make db # open the db to check ELT-ed data

select source, count(*) from social_posts group by 1;
.exit

Set up git hooks. Create a pre-commit file, as shown below.

echo -e '
#!/bin/sh
make ci
' > .git/hooks/pre-commit
chmod ug+x .git/hooks/*

Make commands

We have some make commands to make things run better, please refer to the Makefile to see them.

socialetl
socialetl copied to clipboard

Metadata

Project design

Prerequisites

Setup

Make commands

← Metadata

Owner

Metadata

socialetl socialetl copied to clipboard

Metadata

Project design

Prerequisites

Setup

Make commands

← Metadata

Owner

Metadata

socialetl
socialetl copied to clipboard