scio
scio copied to clipboard
Avoiding saveAsElasticsearch() reshuffle
saveAsElasticsearch() currently reshuffles all data into a fixed number of shards. What do you think about adding an option to avoid AssignToShard transform? For our usecase, reshuffle is adding unnecessary overhead. (We're limiting the concurrent access to ES by --maxNumWorkers which is a lot more efficient of limiting parallelism than shuffling a lot of data into small number of very hot keys).
@regadas WDYT? @mdvorsky if you think it's a trivial change, mind submitting a PR and we can discuss there?
Yeah we can definitely disable sharding! This will require adding config to allow config of batch size.
@alexclare will take a stab on this one. Thanks!
@regadas isn't that resolved too ?