scio
scio copied to clipboard
Improve KV batch API
I realized that sio is only having an API for beam GroupIntoBatches
with ofSize
- this can be problematic in streaming pipelines when keys have few elements and batch takes time to fill. Adding the
maxBufferingDuration
to the API -
GroupIntoBatches
also allows to batch by byte size, using by default the coder size - 'Abusing' byteSize with custom weight, so users can batch on arbitrary element metric
Codecov Report
Merging #4458 (93daa52) into main (406faf5) will increase coverage by
0.01%
. The diff coverage is100.00%
.
:exclamation: Current head 93daa52 differs from pull request most recent head 7b5a032. Consider uploading reports for the commit 7b5a032 to get more accurate results
@@ Coverage Diff @@
## main #4458 +/- ##
==========================================
+ Coverage 60.50% 60.51% +0.01%
==========================================
Files 275 275
Lines 9882 9888 +6
Branches 444 441 -3
==========================================
+ Hits 5979 5984 +5
- Misses 3903 3904 +1
Impacted Files | Coverage Δ | |
---|---|---|
...spotify/scio/values/PairSCollectionFunctions.scala | 98.18% <100.00%> (+0.06%) |
:arrow_up: |
...y/scio/values/PairSkewedSCollectionFunctions.scala | 93.24% <0.00%> (-2.71%) |
:arrow_down: |
...src/main/scala/com/spotify/scio/coders/Coder.scala | 88.83% <0.00%> (+0.50%) |
:arrow_up: |
Help us with your feedback. Take ten seconds to tell us how you rate us.