bigflow
                                
                                
                                
                                    bigflow copied to clipboard
                            
                            
                            
                        A Python framework for data processing on GCP.
BigFlow
Documentation
- What is BigFlow?
 - Getting started
 - Installing Bigflow
 - Help me
 - BigFlow tutorial
 - CLI
 - Configuration
 - Project structure and build
 - Deployment
 - Workflow & Job
 - Starter
 - Technologies
 - Development
 
Cookbook
- Monitoring
 - Automated end-to-end testing
 - Dockerized, GPU based ML prediction process
 
What is BigFlow?
BigFlow is a Python framework for data processing pipelines on GCP.
The main features are:
- Dockerized deployment environment
 - Powerful CLI
 - Automated build, deployment, versioning and configuration
 - Unified project structure
 - Support for GCP data processing technologies — Dataflow (Apache Beam) and BigQuery
 - Project starter
 
Getting started
Start from installing BigFlow on your local machine. Next, go through the BigFlow tutorial.
Installing BigFlow
Prerequisites. Before you start, make sure you have the following software installed:
- Python >= 3.7
 - Google Cloud SDK
 - Docker Engine
 
You can install the bigflow package globally, but we recommend
installing it locally with venv, in your project's folder:
python -m venv .bigflow_env
source .bigflow_env/bin/activate
Install the bigflow PIP package:
pip install bigflow[bigquery,dataflow]
Test it:
bigflow -h
Read more about BigFlow CLI.
To interact with GCP you need to set a default project and log in:
gcloud config set project <your-gcp-project-id>
gcloud auth application-default login
Finally, check if your Docker is running:
docker info
Help me
You can ask questions on our gitter channel or stackoverflow.