mindbender Single-threaded jq can be a bottleneck for ES indexing

Single-threaded jq can be a bottleneck for ES indexing

Open alldefector opened this issue 10 years ago • 5 comments

We could backport this change to parallelize indexing: https://github.com/HazyResearch/mindbender/commit/bc869e855b62104928506d15611fb2329c786b12

It simply uses parallel instead of split. These improvements for a backport would be great:

check for presence of parallel
configurable parallelization params

Oct 04 '15 18:10 alldefector

Here is another indexing speed optimization: https://github.com/HazyResearch/mindbender/commit/8d4169ab6784236f21e3caf7c794830f54b66357

Oct 04 '15 18:10 alldefector

Thanks for the suggestions! Yeah I was anticipating we'd need parallel indexing pretty soon. I had bad experience with GNU parallel–it was unstable, bloated, CLI changing too much across versions–but will backport these soon maybe using the more familiar xargs or embedding an exact version of parallel.

Side question: After parallelizing, is there any sign of ES being the new bottleneck? Would adding more nodes to the ES cluster help? The keep-elasticsearch-during currently launches an isolated single node ES server, but we could enhance it and introduce a subcommand like mindbender search join-cluster to make it easy to scale out.

Oct 04 '15 19:10 netj

No, ES seems to have a very flexible thread pool scheme in one node and can saturate all cores. I suspect that even if there is only one shard, it's still able to saturate all cores. If hardware is the bottleneck, then yeah, we could add new node support.

Oct 04 '15 19:10 alldefector

I see. Sounds like deciding the cluster size should depend on query time latency requirement.

Oct 04 '15 19:10 netj

Another key performance knob is ES_HEAP_SIZE: https://www.elastic.co/guide/en/elasticsearch/guide/current/heap-sizing.html

But default ES's heap size is 0.25-1G. We may want to use a different default.

Oct 04 '15 19:10 alldefector

mindbender mindbender copied to clipboard

Single-threaded jq can be a bottleneck for ES indexing

mindbender
mindbender copied to clipboard