Mariano Tepper comments

Results 8 comments of


                                            Mariano Tepper

How about add a mean method like the one of sum(List<VectorFloat>)?

Why not just divide the result by `vectors.size()`?

How about add a mean method like the one of sum(List<VectorFloat>)?

You can use this [method](https://github.com/datastax/jvector/blob/d8e9cb16fa5dc4ff6afd0af83df9074279696238/jvector-base/src/main/java/io/github/jbellis/jvector/quantization/KMeansPlusPlusClusterer.java#L436).

Bring back the fused graph index

@tlwillke I did some measurements for ada002-100k, using 192 PQ segments. The dataset contains 99562 vectors. According the ramBytesUsed estimate in PQVectors, they should take 19.74 MB. Roughly speaking, this...

Bring back the fused graph index

Slight refactor in Grid.runOneGraph so that the compressed vectors are loaded only when needed.

Ensure that no node duplicates exist in the adjacency list of any node.

Initial benchmarking shows that index construction got a bit slower (~10%), which is not surprising. Even if this PR moves things in the right direction conceptually, in practice it does...

Ensure that no node duplicates exist in the adjacency list of any node.

The most recent commits overhaul the strategy used to ensure that edges in the graph are unique. Now, each adjacency list is sorted by node ID in ascending order. NodeArray...

Integrate a JVector codec for KNN searches

> Also, calling this a “disk-based” solution seems a bit misleading if the graph still has to be built fully in memory. That’s often the core problem people are trying...

Integrate a JVector codec for KNN searches

> What are the improvements there? Is it adjusting the IO patterns for scoring or is it because rescoring utilizes LVQ? (e.g. scalar quantization centered on a centroid...which is a...