latent-diffusion icon indicating copy to clipboard operation
latent-diffusion copied to clipboard

I found a better ram friendly method to create index and train searcher

Open lxj616 opened this issue 3 years ago • 4 comments

Congratulations on the new awesome fantastic rdm research, but ...

For anyone with insufficient RAM to build index with scann script, I propose using autofaiss to create index

I have a 64GB RAM work station, and when running train_searcher targeting openimages database in the README, I got OOM very quickly (no matter how I reduce the parameters, it just does not fit in at all), and here is how I solve the problem:

from autofaiss import build_index
import numpy as np

embedding = None
for npz_i in range(1,5):
    npz_path = "rdm/retrieval_databases/openimages/2000000x768-part_" + str(npz_i) + ".npz"
    npz_i_obj = np.load(npz_path)
    if embedding is None:
        embedding = npz_i_obj['embedding']
    else:
        embedding = np.concatenate((embedding, npz_i_obj['embedding']), axis=0)

del npz_i_obj

build_index(embeddings=embedding, index_path="faiss_index/knn.index",
            index_infos_path="faiss_index/index_infos.json", max_index_memory_usage="30G",
            current_memory_available="40G")

And If you got lower ram than 64GB, you can edit the max_index_memory_usage="30G", current_memory_available="40G" to your specific specs

when loading the searcher in knn2img_faiss.py

example as follows:

index_faiss = faiss.read_index("faiss_index/knn.index", faiss.IO_FLAG_MMAP | faiss.IO_FLAG_READ_ONLY)

however, be noted that to use _, I = index_faiss.search(x, opt.knn) and the index is the second param returned instead of the first compared to scann code

lxj616 avatar Jul 29 '22 03:07 lxj616

Can you share more clear instructions for running the index?

pwillia7 avatar Aug 06 '22 04:08 pwillia7

OK I managed to get this working with 32GB of RAM by concatenating the 4 files and then moving the part files out of the folder.

pwillia7 avatar Aug 06 '22 20:08 pwillia7

Nice! I ended up making a 96GB swapfile. Still eats RAM when running inference though :|

nerdyrodent avatar Aug 13 '22 15:08 nerdyrodent

It took me about 80GB of RAM (htop says 72.7 GB) to build the SCANN openimage index via scripts/train_searcher.py. I had to create a swapfile to have this run correctly.

I think the authors should mention the RAM requirements in the doc when they tell us to build the index ourselves, because running with 80+ GB of RAM isn't exactly common.

usr-ein avatar Aug 24 '22 21:08 usr-ein