scio icon indicating copy to clipboard operation
scio copied to clipboard

Avoiding saveAsElasticsearch() reshuffle

Open mdvorsky opened this issue 5 years ago • 4 comments

saveAsElasticsearch() currently reshuffles all data into a fixed number of shards. What do you think about adding an option to avoid AssignToShard transform? For our usecase, reshuffle is adding unnecessary overhead. (We're limiting the concurrent access to ES by --maxNumWorkers which is a lot more efficient of limiting parallelism than shuffling a lot of data into small number of very hot keys).

mdvorsky avatar May 27 '19 15:05 mdvorsky

@regadas WDYT? @mdvorsky if you think it's a trivial change, mind submitting a PR and we can discuss there?

nevillelyh avatar Jun 12 '19 13:06 nevillelyh

Yeah we can definitely disable sharding! This will require adding config to allow config of batch size.

regadas avatar Jul 01 '19 16:07 regadas

@alexclare will take a stab on this one. Thanks!

regadas avatar Sep 30 '20 17:09 regadas

@regadas isn't that resolved too ?

RustedBones avatar Apr 20 '22 11:04 RustedBones