bigflow
bigflow copied to clipboard
A Python framework for data processing on GCP.
Bumps [lxml](https://github.com/lxml/lxml) from 4.8.0 to 4.9.1. Changelog Sourced from lxml's changelog. 4.9.1 (2022-07-01) Bugs fixed A crash was resolved when using iterwalk() (or canonicalize()) after parsing certain incorrect input. Note...
Generated DAG files are not prefixed in any way. Also during deployment the whole target DAG folder is cleared. This makes harder to deploy multiple BF projects on single CloudComposer/Airflow....
Allow users to pass some extra adhoc parameters (untyped dict) to a workflow (via 'JobContext')
Current behaviour: dag generation of single workflow is skipped when python package can not be imported. This leads to incomplete deploys (some dags are generated, some are not). Expected: fail...
There are 2 problems when generating and deploying dags: 1. Some workflows can be skipped when generating DAGs, due to errors on module import 2. When we are deploying DAGs,...
Pytest is much more user-friendly. 1. Better runner - colors, stacktraces, code snipets, hides "garbage" stdout/err/logger output. 2. Concise tests - just `def test(): assert ...` instead of junit-approach 3....
If the dataflow job times out it is ended with CANCELLED status. This job won't be included in FAILED Jobs metric. We should somehow report to GCP monitoring about the...
In BeamJob we subtract from the initial timeout two minutes to provide compatibility between airflow/beam timeouts. We should validate this timeout to be bigger than two minutes.
Allow users to reuse bigflow workflows/jobs by using custom "BigflowWorkflow" operator.