jvector issues

(Why not) support building larger-than-memory indexes

1

The DiskANN paper describes building larger-than-memory indexes by partitioning the dataset and then adding vectors to multiple partitions, then combining the graphs. This is 2.5x slower than building the graph...

jbellis

Use Memory Segment API for aligned vector loads.

8

Hi All, Most of the vectorized code in [SimdOps.java ](https://github.com/jbellis/jvector/blob/main/jvector-twenty/src/main/java/io/github/jbellis/jvector/vector/SimdOps.java) is using fromArray API to load the contents into vector. With JDK-20+ Vector API added the support for loading and...

jatin-bhateja

Clean up to index builder api

1

The `GraphIndexBuilder` api can be used in two ways: for live indexing or bulk indexing. We should enforce checks in the api such that users don't call it incorrectly or...

tjake

Investigate PQ speed ups by reducing distance calculations

At this point, profiles of our PQ look like it's almost entirely using distance work. Barring large parameter changes or a paradigm shift in how we quantize, it seems like...

jkni

Automate releases as GitHub workflow

1

We should automate the release process once we're comfortable with the results. I'd prefer a workflow that runs when an appropriate tag is pushed, but I'm open to other options.

jkni

General documentation and API index

3

First off, great work! --- It'd be very helpful if there were general documentation which helped map the theory and concepts to the class hierarchy or the main facades. That...

bahmanm

Experiment with VQ coarse quantization for PQ

CAGRA performs compression with two stages: **Vector Quantization (VQ)**, where kmeans is applied to the full-dimensional vectors to create a codebook of coarse cluster centers **Product Quantization (PQ)**, where the...

jbellis

Make it possible for JVector users to consume MemorySegmentReader

2

The JVector jar available through Maven Central packages JVector 11 on the regular class path, with jvector-twenty and jvector-native adding additional classes through multi-release JAR support. This means that users...

jkni

Usage of imprecise fp-model=fast.

1

With ANN search, we accept giving up accuracy for speed. Since most of the code in [jvector_simd.c](https://github.com/jbellis/jvector/blob/main/jvector-native/src/main/c/jvector_simd.c) deals in floating-point computations, it may make sense to pass—fp-model=fast to the GCC...

jbhateja

RandomAccessScoreProvider with MapRandomAccessVectorValues + non-sequential IDs produces wrong centroid

3

In 97e523c306ae42c3e963484e320fa1c7432b5250 `approximateCentroid()` implementation for the `BuildScoreProvider` returned from `BuildScoreProvider.randomAccessScoreProvider()` was updated to allow for non-sequential node IDs. However the iteration only takes into account nodes with ID < `ravv.size()`....

vbekiaris

jvector
jvector copied to clipboard

Metadata

(Why not) support building larger-than-memory indexes

Use Memory Segment API for aligned vector loads.

Clean up to index builder api

Investigate PQ speed ups by reducing distance calculations

Automate releases as GitHub workflow

General documentation and API index

Experiment with VQ coarse quantization for PQ

Make it possible for JVector users to consume MemorySegmentReader

Usage of imprecise fp-model=fast.

RandomAccessScoreProvider with MapRandomAccessVectorValues + non-sequential IDs produces wrong centroid

← Metadata

Owner

Metadata

jvector jvector copied to clipboard

Metadata

← Metadata

Owner

Metadata

jvector
jvector copied to clipboard