aleph
                                
                                 aleph copied to clipboard
                                
                                    aleph copied to clipboard
                            
                            
                            
                        FEATURE: Parameterize number of shards per entity schema
Is your feature request related to a problem? Please describe.
We maintain separate ES indexes per FtM entity schema. For example, all Company entities are stored in one index, and all Person entities are stored in another index.
Depending on the nature of the data in an Aleph instance, some indexes will be larger and some will be smaller. However, it is recommended that index shards are approx. 10-50 GB in size each (although the optimal size may be different depending on use case). That means the number of shards per index needs to be adjusted based on the size of the index.
Aleph currently has hard coded shard sizes based on the typical use cases for Aleph instances. By default, indexes have 5 shards. However, some indexes have more shards (the Pages index which stores PDF, Word documents etc. uses 10 shards) and some have fewer shards (the Passport index uses only 1 shard).
However, in the real world, Aleph instances come in many different sizes. Some are used to index only a handful of documents, while other store terabytes of data.
Describe the solution you'd like We should be able to adjust the number of shards per index using configuration options/environment variables so we can use different values for different Aleph instances.
However, this doesn’t help if an existing Aleph instance grows too large, as shard numbers for existing shards cannot be adjusted arbitrarily.
Describe alternatives you've considered -/-
Additional context
- ES docs: Size your shards
- Hard coded shard sizes
- Why I’m opening this issue now: https://github.com/alephdata/aleph/issues/3161#issuecomment-1617943899