Create BufferImpl

Open pauldix opened this issue 2 years ago • 0 comments

The Buffer is responsible for buffering writes into Edge that have yet to be persisted into object storage as Parquet files. The buffer should be queryable from DataFusion and, if configured, it should make writes into itself durable to a write ahead log on a locally attached disk (#24557).

The buffer is also responsible for keeping the in-memory Catalog, which is the definition of what databases, tables, and columns exist. When writes come into the system, the buffer is responsible for validating that those writes match the existing schema, or that the in-memory catalog schema is updated.

The buffer should keep data organized into segments, each of which will map to a single WAL file (if configured).

The server that uses the buffer will have configured either a max memory size or a rollover time for when the open segment gets closed and persisted and a new segment gets opened. For example, buffer data until 2GB is filled or 1 hour has passed, whichever comes first. Generally we'd expect users to prefer the time interval, but all data in the buffer must be able to fit into RAM so the size constraint is meant as a fallback if write throughput is too high.

When rollover happens, the server process will persist the catalog if it has any updates in this segment and it will persist all buffered data from the segment into Parquet files.

A sketch of the buffer interface is here, along with the ChunkContainer, which the BufferImpl will implement so that it is queryable:

https://github.com/influxdata/influxdb/blob/5831cf8cee706302d39d2f15b271a344dc3eda44/influxdb3_write/src/lib.rs#L52-L108

For the underlying implementation, there are a few ideas I have.

One is to buffer the data as it arrives in whatever is simplest (but not necessarily memory efficient) and then periodically convert this data into in memory Parquet data, compacting it as we go (just within a segment), either for query or as writes into a table cross a threshold.
The other would be to keep data in a tree structure (db -> table -> partition) and use a string interner for any string data. Then convert that to RecordBatch or Parquet on demand for query or persist.

I have a preference for the Parquet data style as I think that will amortize the cost of creating Parquet data across time so that when it comes time to persist it, we have very little work to do and it's mostly just pushing Parquet data up to object storage.

Although I'll only be able to tell by creating a spike and then running some load testing on my laptop, which I plan to do.

### Tasks
- [ ] https://github.com/influxdata/influxdb/issues/24571
- [ ] https://github.com/influxdata/influxdb/issues/24572
- [ ] https://github.com/influxdata/influxdb/issues/24573
- [ ] https://github.com/influxdata/influxdb/issues/24574
- [ ] https://github.com/influxdata/influxdb/issues/24575

Jan 08 '24 19:01 pauldix