seq-tickets icon indicating copy to clipboard operation
seq-tickets copied to clipboard

Improve query performance over high-cardinality fields

Open nblumhardt opened this issue 2 years ago • 2 comments

Signal indexes provide a great performance boost for queries that incorporate a common predicate (the signal). For example, an Errors signal with a filter like @Level = 'Error' can very effectively narrow the scope of searches among errors.

The kinds of fields like @Level that work well as signal indexes have low cardinality: there are only a handful of different levels, so creating a signal for each one is reasonable and effective.

High-cardinality fields like RequestId are at the opposite end of the spectrum: there are many possible values for RequestId, so creating a signal for each one is entirely impractical (and would be incredibly inefficient given the implementation of signal indexing today).

Seq has some optimizations for searching on high-cardinality fields, but these currently tend to be I/O bound, and lag a long way behind the efficiency of searches that can be accelerated by signals.

We'd like to add some additional storage level functionality, such as additional index types, to speed up searches on high-cardinality fields by reducing I/O requirements.

The next step for this will be the creation of an RFC with our proposed implementation.

nblumhardt avatar Feb 02 '22 04:02 nblumhardt

@nblumhardt , Any progress on this? Again I found myself really looking forward to this. I needed to search for a specific property value (like RequestId) which is obviously unindexed but only exists in about 5 events out of millions. The search took a few minutes to execute. I think that adding some feature to allow me to set a list of properties I know are being used in searches and that I am willing to "pay the price" of storage of their values in indexes could be really useful.

DanAvni avatar Jun 26 '22 15:06 DanAvni

I agree @DanAvni . We are working on the architectural changes that will support improvements like high-cardinality indexes. At this stage we don't have an RFC ready to publish.

liammclennan avatar Jun 26 '22 22:06 liammclennan