bigflow
bigflow copied to clipboard
A Python framework for data processing on GCP.
Without setting the ttl on google cloud storage folder used as beam temp/staging, the storage usage will grow infinitely. We need to add a tip for beam users so they...
For smoother test creation we need class EndToEndTest https://github.com/allegro/bigflow/pull/199#discussion_r557977651
Currently, BigFlow goes through the whole package structure to find dags. If there is a module with a workflow and bigflow can't read the module, then bigflow just skips this...
Add to tutorial part about creating many envs.
1. We need to setup deprecation policy for "breaking changes". 2. Describe this strategy in docs.
At this moment job_failure_count metric contains infos about composer env and job_id. We have to add workflow_name and env info.
At this moment we use https://pypi.org/ for unstable dev versions. We could use https://test.pypi.org/ instead.
Add a way to create `bigflow.bigquery.DatasetManager` from generic `bigflow.Config` This may simplify creation of mixed workflows (part is written with bigquery, part is based on dataproc etc).
Hello, may I kindly ask you about providing some instructions for migration to new version to facilitate this process for other developers? Especially, I found that config.py should be modified...
Scaffold may be used by wrapper scripts or in some automation environment. Real use case: it may be integrated into script, which regenerates static project template (such script need to...