scio
scio copied to clipboard
BatchDoFn and sio batch API on SCollection
Amortize processing cost by local batching of elements Batching respects windowing
This aims to give symetric API with the KV batching in #4458
As the batch is emitted on finishBundle
, no maxBufferingDuration
is required
Codecov Report
Merging #4489 (e207766) into main (36935c1) will decrease coverage by
0.29%
. The diff coverage is54.05%
.
:exclamation: Current head e207766 differs from pull request most recent head 3e7625b. Consider uploading reports for the commit 3e7625b to get more accurate results
@@ Coverage Diff @@
## main #4489 +/- ##
==========================================
- Coverage 60.48% 60.19% -0.30%
==========================================
Files 275 275
Lines 9882 10061 +179
Branches 438 840 +402
==========================================
+ Hits 5977 6056 +79
- Misses 3905 4005 +100
Impacted Files | Coverage Δ | |
---|---|---|
...in/scala/com/spotify/scio/values/SCollection.scala | 88.38% <0.00%> (-5.27%) |
:arrow_down: |
...scala/com/spotify/scio/bigquery/MockBigQuery.scala | 0.00% <0.00%> (ø) |
|
...la/com/spotify/scio/bigquery/client/TableOps.scala | 0.00% <0.00%> (ø) |
|
...a/com/spotify/scio/testing/TransformOverride.scala | 100.00% <100.00%> (ø) |
|
...om/spotify/scio/elasticsearch/CoderInstances.scala | 44.11% <0.00%> (-5.89%) |
:arrow_down: |
...om/spotify/scio/elasticsearch/CoderInstances.scala | 42.42% <0.00%> (-5.86%) |
:arrow_down: |
...com/spotify/scio/bigquery/types/TypeProvider.scala | 47.22% <0.00%> (-2.78%) |
:arrow_down: |
...la/com/spotify/scio/bigquery/client/BigQuery.scala | 22.44% <0.00%> (-2.56%) |
:arrow_down: |
...rc/main/scala/com/spotify/scio/util/ScioUtil.scala | 59.25% <0.00%> (-2.28%) |
:arrow_down: |
...n/scala/com/spotify/scio/extra/annoy/package.scala | 80.00% <0.00%> (-2.06%) |
:arrow_down: |
... and 42 more |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
I ran some test on dataflow with gs://apache-beam-samples/shakespeare/kinglear.txt
input:
- fixed size batch are respected
- weighted batches are respected
@clairemcginty all comments should be addressed. I managed to trick the tests to get a single bundle. This ensures batching is working within the bundle