faiss icon indicating copy to clipboard operation
faiss copied to clipboard

I found an issue during the IndexIVFPQ query process in version 1.7.2, and I'm not sure if it's a bug. I hope you can help me solve it.

Open yangshubin2023 opened this issue 2 years ago • 1 comments

Summary

//.........

int main() {
    int d = 64;      // dimension
    int nb = 100000; // database size
    int nq = 10000;  // nb of queries

    std::mt19937 rng;
    std::uniform_real_distribution<> distrib;

    float* xb = new float[d * nb];
    float* xq = new float[d * nq];

    // .......

    int nlist = 100;
    int k = 4;
    int m = 8;                       // bytes per vector
    faiss::IndexFlatIP quantizer(d); // the other index
    faiss::IndexIVFPQ index(&quantizer, d, nlist, m, 8, METRIC_INNER_PRODUCT);

    index.train(nb, xb);
    index.add(nb, xb);

    { // sanity check
        idx_t* I = new idx_t[k * 5];
        float* D = new float[k * 5];

        index.search(5, xb, k, D, I);

        //........

        delete[] I;
        delete[] D;
    }

    delete[] xb;
    delete[] xq;

    return 0;
}

Platform

Operating System: Ubuntu 20.04.3 LTS Kernel: Linux 5.4.0-122-generic Architecture: x86-64

Running on:

  • CPU

Interface:

  • C++

Reproduction instructions

After reading the Feiss code, I found that IndexIVFPQ used residual calculation during both training and data addition processes. The residual data is used to calculate fine-grained centroids, and the fine-grained centroid data is also stored as residual data.

However, during the query process, the query vector x was not subjected to residual processing and compared with the fine-grained centroid to calculate the distance. Is this correct?

In my understanding, the query quantity x should also be calculated based on the residual vector x ', and then use x' and fine-grained centroid comparison techniques based on distance to make sense.

Comparing the fine-grained centroids formed by the original vector x and residual data makes me a bit confused.

I hope you can help me answer it. Thank you~

yangshubin2023 avatar Aug 22 '23 08:08 yangshubin2023

You are correct that the query vector is also compared based on the residual vectors. What makes you think this is not the case?

mdouze avatar Aug 28 '23 05:08 mdouze