manticoresearch icon indicating copy to clipboard operation
manticoresearch copied to clipboard

INSERT DELAYED

Open sanikolaev opened this issue 1 year ago • 8 comments

Currently mass inserts into a Manticore real-time table require batching for better performance.

The task is to implement the INSERT DELAYED SQL command which would work as follows:

  • INSERT DELAYED ... gets routed to Buddy
  • Buddy saves the documents internally and returns OK immediately
  • When the batch becomes big enough Buddy sends a bulk insert into Manticore
  • Buddy can be aware of how much the daemon/server is loaded to decide to:
    • suspend accepting new inserts
    • send batches to the daemon with concurrency N where N depends on the load

In theory it should give close-to-best throughput and maximium ease of use for the user: user just sends inserts one by one, and the throughput is the same as if it would send them in batches with high concurrency.

Here's a blogpost which explains how to add features like this https://manticoresearch.com/blog/manticoresearch-buddy-pluggable-design/

Once done, the documentation is to be updated as well - https://github.com/manticoresoftware/manticoresearch/tree/master/manual

sanikolaev avatar Oct 07 '23 06:10 sanikolaev

Does that mean Buddy will manage his state? Answering "OK" without real attempt to insert may be confusing. Described functionality also looks like work with message queues, Apache Kafka or NATS, for instance.

AbstractiveNord avatar Oct 08 '23 10:10 AbstractiveNord

Answering "OK" without real attempt to insert may be confusing

Valid point. We should make it clear in the docs that it would work in the fire-and-forget mode. In case it's important to not lose the data, the user should prefer simple batching.

sanikolaev avatar Oct 10 '23 16:10 sanikolaev

Valid point. We should make it clear in the docs that it would work in the fire-and-forget mode. In case it's important to not lose the data, the user should prefer simple batching.

As typical user, if I need batching, I'll use message queues. MQs solves a ton of problems:

  1. Me or my team knows what to expect from MQs. I am not sure what to expect from PHP sidecar becoming to be stateful.
  2. If Buddy will answer me OK on my insert, but insert will fail due to bad schema, how I should be notified? Even if that fail will be logged, I will have problems with unwinding the problem.
  3. I definitely concern about PHP daemon to work with disk under high enough load.
  4. Using MQs I can precisely tune my data ingestion pipeline, configure batch size, timeouts, everything. Can I do this with Buddy?
  5. Currently ManticoreSearch doesn't provide any authentication procedures, what causes to rely on MQs or some other backend for that.
  6. Correct me if I'm wrong, but ManticoreSearch allows me to insert data directly to the shards of a distributed table, which makes sense to shard even at delivery level, at messages queues level.

AbstractiveNord avatar Oct 11 '23 04:10 AbstractiveNord

Addendum: Chunk system of ManticoreSearch looks extremely familiar for ClickHouse users. Same behavior as INSERT DELAYED proposal, ClickHouse provides fire-and-forget table engine, such as Buffer Engine. Usage of MQs, such as Apache Kafka, or RabbitMQ, or NATS Jetstream is common-wide approach to insert data for most users of ClickHouse.

AbstractiveNord avatar Oct 11 '23 04:10 AbstractiveNord

if I need batching, I'll use message queues

There seems to be some misunderstanding here. When I refer to "batching," I'm talking about the format insert into t values(...),(...) (or the equivalent via JSON). This has nothing to do with message queues. Something has to execute this query. If it's done behind a message queue, that's great. However, Manticore Search isn't concerned with that (unless we introduce some fundamental changes). It's just that each individual insert into command, regardless of how many documents are in the batch, results in a single segment in a RAM chunk. This segment then has to be merged. Given a constant ingestion rate of X documents per second, if there's only one document in a batch, the RAM chunk will need to merge segments more often. Conversely, with a larger batch, merging occurs less frequently.

While a deeper integration with e.g. Kafka is logical, it's a topic for another discussion. Implementing this with a data-on-disk guarantee is more challenging than simply accumulating a batch in PHP and executing insert into t values(...),(...) when the batch size reaches X or after a certain timeout.

Integration with Kafka without the guarantee is also possible through Buddy, but that would be a separate task. The key point is that data can only be considered safely written AFTER the "insert" operation is complete. Providing any guarantees before that would require fundamental changes to the write logic in the daemon.

sanikolaev avatar Oct 11 '23 05:10 sanikolaev

As discussed on dev call of Oct 12th 2023, it makes sense to check if there will be a benefit of routing INSERT DELAYED to Buddy at all provided the overheads.

sanikolaev avatar Oct 12 '23 08:10 sanikolaev

As discussed on dev call of Oct 12th 2023, it makes sense to check if there will be a benefit of routing INSERT DELAYED to Buddy at all provided the overheads.

Please check risks of OOM inside Docker containers. Still not sure for have configurable memory split between searchd and Buddy at high enough concurrent inserts. Also interesting memory footprint size.

AbstractiveNord avatar Oct 12 '23 08:10 AbstractiveNord

Please check risks of OOM inside Docker containers

Currently Manticore Search's max packet size is 128MB, so it doesn't make sense to accumulate wherever outside searchd (Buddy or the user's app) a buffer larger than 128MB, so I don't see a significant source of memory overconsumption here.

sanikolaev avatar Oct 16 '23 07:10 sanikolaev