scio icon indicating copy to clipboard operation
scio copied to clipboard

Improve KV batch API

Open RustedBones opened this issue 1 year ago • 1 comments

I realized that sio is only having an API for beam GroupIntoBatches with ofSize

  1. this can be problematic in streaming pipelines when keys have few elements and batch takes time to fill. Adding the maxBufferingDuration to the API
  2. GroupIntoBatches also allows to batch by byte size, using by default the coder size
  3. 'Abusing' byteSize with custom weight, so users can batch on arbitrary element metric

RustedBones avatar Jul 14 '22 07:07 RustedBones

Codecov Report

Merging #4458 (93daa52) into main (406faf5) will increase coverage by 0.01%. The diff coverage is 100.00%.

:exclamation: Current head 93daa52 differs from pull request most recent head 7b5a032. Consider uploading reports for the commit 7b5a032 to get more accurate results

@@            Coverage Diff             @@
##             main    #4458      +/-   ##
==========================================
+ Coverage   60.50%   60.51%   +0.01%     
==========================================
  Files         275      275              
  Lines        9882     9888       +6     
  Branches      444      441       -3     
==========================================
+ Hits         5979     5984       +5     
- Misses       3903     3904       +1     
Impacted Files Coverage Δ
...spotify/scio/values/PairSCollectionFunctions.scala 98.18% <100.00%> (+0.06%) :arrow_up:
...y/scio/values/PairSkewedSCollectionFunctions.scala 93.24% <0.00%> (-2.71%) :arrow_down:
...src/main/scala/com/spotify/scio/coders/Coder.scala 88.83% <0.00%> (+0.50%) :arrow_up:

Help us with your feedback. Take ten seconds to tell us how you rate us.

codecov[bot] avatar Jul 14 '22 08:07 codecov[bot]