til
til copied to clipboard
How many shards should I have in my Elasticsearch cluster?
1. How does shard size affect performance?
In Elasticsearch, each query is executed in a single thread per shard. Multiple shards can however be processed in parallel, as can multiple queries and aggregations against the same shard.
This means that the minimum query latency, when no caching is involved, will depend on the data, the type of query, as well as the size of the shard. Querying lots of small shards will make the processing per shard faster, but as many more tasks need to be queued up and processed in sequence, it is not necessarily going to be faster than querying a smaller number of larger shards. Having lots of small shards can also reduce the query throughput if there are multiple concurrent queries.
2. How many shards should I have in my Elasticsearch cluster?
https://www.elastic.co/blog/how-many-shards-should-i-have-in-my-elasticsearch-cluster
A good rule-of-thumb is to ensure you keep the number of shards per node below 20 per GB heap it has configured. A node with a 30GB heap should therefore have a maximum of 600 shards, but the further below this limit you can keep it the better. This will generally help the cluster stay in good health.
https://thoughts.t37.net/designing-the-perfect-elasticsearch-cluster-the-almost-definitive-guide-e614eabc1a87#3615
- less 3M documents: 1 shard
- between 3M and 5M documents with an expected growth over 5M: 2 shards.
- More than 5M: int (number of expected documents / 5M +1)
https://tech.ebayinc.com/engineering/elasticsearch-performance-tuning-practice-at-ebay/
Too small a shard number would make the search unable to scale out. For example, if the shard number is set to 1, all documents in your index would be stored in one shard. For every search, only one node can be involved. It’s time consuming if you have a lot of documents. From another side, creating an index with too many shards is also harmful to performance, because Elasticsearch needs to run queries on all shards, unless a routing key is specified in the request, then fetch and merge all returned results together.
From our experience, if the index is smaller than 1G, it’s fine to set the shard number to 1. For most scenarios, we can leave the shard number as the default value 5, but if shard size exceeds 30GB, we should increase the shard number to split the index into more shards.
The shard number cannot be changed once an index is created, but we can create a new index and use the reindex API to move data.
We tested an index that has 100 million documents and is about 150GB. We used 100 threads to send search requests.
https://qbox.io/blog/optimizing-elasticsearch-how-many-shards-per-index
Other suggestion from AWS https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/sizing-domains.html