elasticsearch
elasticsearch copied to clipboard
Track segment information in (mixed) dense vector search
Related to #106591 , a good point was raised that in case there're bugs or concerns about a given KNN query running against a "mixed" set of segments (e.g. partly flat
and partly hnsw
) it would be hard to debug where the problem comes from.
To this end it'd be useful to have some way to track segment info in this context and e.g. be able to relate failures / warnings / slowness to specific segments.
Pinging @elastic/es-search (Team:Search)
one thing we could do is start by adding information from Lucene SegmentInfo#codec
within ES Engine
class to expose which kinds of underlying data structures are used within each segments (including KnnVectorFormat
) within the Index Segments API.
another option is to enable tracking vector formats in AbstractKnnVectorQuery#explain
so that the Explanation
also contains per-doc vector format. This would help in situations were mappings have been updated (e.g. from hnsw
to int8_hnsw
) but most of the knn query results still come from segments with pre-update formats.