Mariano Tepper

Results 8 comments of Mariano Tepper

Why not just divide the result by `vectors.size()`?

You can use this [method](https://github.com/datastax/jvector/blob/d8e9cb16fa5dc4ff6afd0af83df9074279696238/jvector-base/src/main/java/io/github/jbellis/jvector/quantization/KMeansPlusPlusClusterer.java#L436).

@tlwillke I did some measurements for ada002-100k, using 192 PQ segments. The dataset contains 99562 vectors. According the ramBytesUsed estimate in PQVectors, they should take 19.74 MB. Roughly speaking, this...

Slight refactor in Grid.runOneGraph so that the compressed vectors are loaded only when needed.

Initial benchmarking shows that index construction got a bit slower (~10%), which is not surprising. Even if this PR moves things in the right direction conceptually, in practice it does...

The most recent commits overhaul the strategy used to ensure that edges in the graph are unique. Now, each adjacency list is sorted by node ID in ascending order. NodeArray...

> Also, calling this a “disk-based” solution seems a bit misleading if the graph still has to be built fully in memory. That’s often the core problem people are trying...

> What are the improvements there? Is it adjusting the IO patterns for scoring or is it because rescoring utilizes LVQ? (e.g. scalar quantization centered on a centroid...which is a...