scio
scio copied to clipboard
Refactor BQ to expose all beam's configurations
Here are the main changes:
-
the BQ
Tablesource has a single normalized definition, with multiple constrictors (form string spec orTableReference). It nows includes an optionalTable.Filterthat can be used is the storage read API to project and filter. -
read API changes with
| API | method | returned type |
|---|---|---|
| bigQuerySelect | export | TableRow |
| bigQuerySelectFormat | export | T |
| bigQueryTable | export | TableRow |
| bigQueryTableFormat | export | T |
| bigQueryStorage | direct | TableRow |
| bigQueryStorageFormat | direct | T |
| typedBigQuery | export | T |
| typedBigQueryStorage | direct | T |
Format API take a BigqueryIO.Format object allowing to convert either from GenericRecord (this should be prefered) or TableRow
- The
StorageApi allow to pass anErrorHandler(Fix #5530). In order to preserve a flat structureScioContext.errorSink(): ErrorSinkhas been added. This allow to do the following
val errorSink = sc.errorSink()
sc.bigQueryStorageFormat[MyType](
table,
format,
errorHandler = errorSink.handler
)
val errors: SCollection[BadRecord] = errorSink.sink()
The handler can be passed to multiple IOs before sink is materialized. The sink will flatten the errors from the IOs.