census-parquet
census-parquet copied to clipboard
Python tools for creating Parquet files from 2020 Census Data
census-parquet
Python tools for creating and maintaining Parquet files from US 2020 Census Data.
Installation
To use the data download shell script files first install wget.
To install the census-parquet package use
pip install census-parquet
This will also install the required Python dependencies which are:
Usage
To run the census-parquet code simply use
run_census_parquet
This runs the following scripts in order:
-
download_boundaries.sh
- This script downloads the Census Boundary data needed to runprocess_boundaries.py
-
download_population_stats.sh
- This script downloads population stat data needed for process_blocks.py -
download_blocks.sh
- This script downloads the Census Block data needed to run process_blocks.py -
process_boundaries.py
- This script processes the Census Boundary data and creates parquet files. The parquet files will be output into aboundary_outputs
folder. -
process_blocks.py
- This script processes Census Block data and creates parquet files. The final combined parquet file will have the nametl_2020_FULL_tabblock20.parquet
.