vespa
vespa copied to clipboard
Partitioning parent-child documents by parent id to avoid full replication of parent documents
Is your feature request related to a problem? Please describe.
I have many documents (200M+) with complex hierarchical structure. It cannot be expressed with nested structures like array
Describe the solution you'd like Some way of describing sharding logic where parent IDs are used as keys. This lets us not copy all parent documents to all content nodes.
Describe alternatives you've considered Considered all alternatives available today, each of them has certain limitations:
- nested structures
- parent-child with fewer beafy content nodes to reduce storage overhead
- full denormalisation.
Hi, and thanks for the feature request!
Yes, the copy-parent-to-all-nodes feature potentially requires much memory, particularly in clusters with many nodes.
What you are describing is a kind of distributed join feature. https://github.com/vespa-engine/sample-apps/tree/master/examples/joins can be relevant as a way to solve problems where the parent/child feature cannot be used
For reference for others: https://docs.vespa.ai/en/parent-child.html
This has been considered before and does make sense. We need to use some variant of document id's with groups and distribute the same groups to the same nodes across those document types. I think the main issue with it is that we'll easily end up with badly balanced clusters.
This has been considered before and does make sense. We need to use some variant of document id's with groups and distribute the same groups to the same nodes across those document types. I think the main issue with it is that we'll easily end up with badly balanced clusters.
Can be done in theory with consistent hashing. Zookeeper is already used in vespa anyway. Alternatively this concern can be shifted to the user. E.g. "you've been warned, this grouping is supposed to be used with well balanced data"
Yes, we do use something similar to consistent hashing in Vespa (the CRUSH algorithm), but here we need to distribute each group to a limited set of nodes to avoid needing to place all global documents on all nodes, while we have no control over the size of each group. I'm not sure how well this can be solved but it for sure adds new complexity to the balancing problem.