faiss icon indicating copy to clipboard operation
faiss copied to clipboard

Merge multiple index files into one [not on_disk]

Open juncaofish opened this issue 4 years ago • 4 comments

To improve the efficiency, I use multiple process to build several index files (block_0.index, block_1.index, block_2.index, ...), but want to merge them into one index, that could be load to GPU memory later. How could I implement this?

BTW, each index was built with "OPQ64_256,IVF256,PQ32".

juncaofish avatar Feb 24 '21 03:02 juncaofish

See this example code: https://gist.github.com/mdouze/7331e6fc1da2334f30706b9b9962068b

mdouze avatar Feb 26 '21 11:02 mdouze

https://gist.github.com/mdouze/7331e6fc1da2334f30706b9b9962068b

Thanks! Will give you feedback after a try.

juncaofish avatar Mar 01 '21 03:03 juncaofish

The script works like a charm. Thanks a lot. How could I get more details of python api? Should I go through the cpp source code or the paper to work out this kind of problem ?

juncaofish avatar Mar 01 '21 15:03 juncaofish

I seem to have a funky issue with the merging procedure described in the gist.

This is the code I'm currently using to merge indices:

"""populate an index"""
from tempfile import NamedTemporaryFile
import faiss

def merge_invlists(il_src, il_dest):
    """
    merge inverted lists from two ArrayInvertedLists
    may be added to main Faiss at some point
    From: https://gist.github.com/mdouze/7331e6fc1da2334f30706b9b9962068b
    """
    assert il_src.nlist == il_dest.nlist
    assert il_src.code_size == il_dest.code_size

    for list_no in range(il_src.nlist):

        il_dest.add_entries(
            list_no,
            il_src.list_size(list_no),
            il_src.get_ids(list_no),
            il_src.get_codes(list_no),
        )


def merge_indices( indices, merged_index_name):
    """merge multiple indices into 1"""
    tmp_empty = NamedTemporaryFile()
    tmp_merged_idx = NamedTemporaryFile()

    empty_index = faiss.read_index(indices[0])
    empty_index.reset()
    faiss.write_index(empty_index, tmp_empty.name)

    empty_index = faiss.read_index(tmp_empty.name)

    ntotal = empty_index.ntotal  # = 0

    indices_read = []
    for i in indices:
        index = faiss.read_index(tmp_idx.name)
        indices_read.append(index)

    for i in indices_read:
        merge_invlists(
            faiss.extract_index_ivf(i).invlists,
            faiss.extract_index_ivf(empty_index).invlists,
        )
        ntotal += i.ntotal

    empty_index.ntotal = faiss.extract_index_ivf(empty_index).ntotal = ntotal

    faiss.write_index(empty_index, tmp_merged_idx.name)

Where indices is a list of files representing indexes.

The issue I'm encountering is give index_1, index_2, and index_3, if I serve them individually, the results are spread across them. After running the merging procedure I would expect the results to be the same. However I see that tendentially, the search return items included in the index_1 (not in index_2 and index_2).

@mdouze do you have any insights on why this could occur? Do I have to retrain the merged index in order to return the correct result?

Any help would be very much appreciated! Thank you

rodrigoalmeida94 avatar May 27 '22 14:05 rodrigoalmeida94

@rodrigoalmeida94 am also facing the same issue i.e. not bale to search in multiple indexes? Did you got the solution for this?

Sankalp991 avatar Dec 19 '22 15:12 Sankalp991

Hi, folks, I have the same request: how can I stack/combine/merge several indexes...

I tried faiss.merge_into, but got Error: 'ivf' failed with IndexFlatIP. And I found this PR that says that "Make merge_into support all types of Index", but still have the same issue after updating faiss version to 1.7.3

btw, I see [not on_disk] in the header, maybe there is an alternative solution "on disk"?

koren-v avatar Mar 06 '23 04:03 koren-v

merge_into is not specific to ondisk so it should work. Would you mind opening an issue and post the code that you are using?

mdouze avatar Mar 06 '23 08:03 mdouze

@mdouze thanks for your answer, sure, created a new issue here

koren-v avatar Mar 06 '23 12:03 koren-v