quickwit icon indicating copy to clipboard operation
quickwit copied to clipboard

Distributed indexing / control plane [design]

Open fulmicoton opened this issue 2 years ago • 5 comments

Currently Quickwit's indexing is not distributed.

Our plan for distributed indexing is to rely on dispatching indexing over a set of indexing nodes, using a (subset-of-partitions of a source) as the insecable unit of job.

Let me redescribe these concepts here. Indexing works by starting a so-called indexing pipeline, that consists of a managed set of actors.

The first actor is called a source, and it is in charge of defining a (partially more on this in a second) ordered stream of documents, and pushing them to the indexer.

At the end of the pipeline, the publisher publishes splits. That consists in uploading the split file to an object storage and updating the metastore. During that update, the split and its metadata are added AND the position up to which the indexing is done is updated. That position is expressed as a "Checkpoint".

As a first approximation, it is ok to consider the source as a log, where each doc is an entry with a growing sequence number.

In reality, for scale purpose sources are partitionned. Each partition is its own independent log, and message are only ordered within a given log.

In order to make indexing distributed, we will need a way to decided which node should take care of the indexing of which (index, set of source partitions), react to the loss of nodes, etc.

Ideally the design should fit both in an on-premise environment, AND on k8s.

fulmicoton avatar Mar 21 '22 06:03 fulmicoton

This can be divided into two parts

  1. form an indexer cluster This can be easily obtained based on gossip. However, it is important to distinguish between indexer and searcher.

  2. distribute the logs to the nodes in the indexer cluster can partition the logs based on hash

sunisdown avatar Jun 29 '22 10:06 sunisdown

We are working on this, we will share more details about the design soon :)

On the cluster formation, we already use chitchat (based on anti-entropy gossip algorithm scuttlebutt). The good thing is that the gossip gives the metadata to get the indexer node states.

On the distribution of indexing, our goal is to have a control plane that schedules the indexing tasks (by index/source/partitions set) on the indexers. Note that the distribution of the tasks will be oversimplified at the start, the user will probably be able to configure it manually, also we won't handle the control plane's high availability in this first version.

fmassot avatar Jun 29 '22 11:06 fmassot

@fmassot glad to know it. thanks for your reply.

On the distribution of indexing

Are you going to store the routing information in meta store and let the indexing task read the information from meta store?

sunisdown avatar Jun 29 '22 15:06 sunisdown

Are you going to store the routing information in meta store and let the indexing task read the information from meta store?

Yes, the typical example is for a Kafka source: you may want to index the first 10 partitions on the first indexer, the next 10 on the second indexer etc. This will be defined in the source config that is saved in the metastore.

fmassot avatar Jun 30 '22 13:06 fmassot

It would be great if we have a control plane to schedule the indexing task.

just share my thoughts about the config

  1. for large cluster, we would like to make the indexer-cluster auto-scaling, like scaling out at peak hour, scale in at night. Quickwit should have some automation scheduler for the routing, like partition the log data by range or hash, scheduler can route the log data to indexer automated.
  2. range or hash may not enough for some cases like data skew so we need to rebalance the data by config, route the partition to a designated node. even we have the config for routing, we also need some fallback solutions, for example some if we scale in the cluster, maybe the node will be terminated, and the partition need to fallback to other nodes.

so I think quickwit should have two routing layer first layer stored in metadata, if the partition hit the rules in first layer, then follow the config. second layer is some routing algorithm(hash/range), if the partition don't have config, then follow the routing algorithm.

sunisdown avatar Jul 01 '22 05:07 sunisdown

We've designed a solution for distributed ingestion and indexing. Design docs are currently on Notion, but we will eventually commit them to our internal docs.

guilload avatar Aug 15 '23 13:08 guilload