IndexFlatL2/IndexFlatIP python multiprocess memory leak
Summary
Platform
OS: Ubuntu 20.04.5 LTS
Faiss version: v1.7.2->v1.7.4
Installed from: pip install
Faiss compilation options: no
Running on:
- [x] CPU
- [ ] GPU
Interface:
- [ ] C++
- [x] Python
Reproduction instructions
I've run into this bug twice
- In Python ProcessPoolExecutor
def dis2(q):
black_npy = np.stack([np.load("black-b316-512.npy"), np.load("black-224-224-512.npy"), ], 0)
index = faiss.IndexFlatL2(q.shape[-1])
index.add(black_npy)
D, I = index.search(q, 1)
return D.mean()
pool = ProcessPoolExecutor()
self.pool.submit(dis2,q)
- In triton python backend(bls)
When I reduced the version to 1.6.1, the memory leak was resolved.Other versions are still being tested.
can you elaborate more on where you're seeing the memory leak inside of faiss? In the above example, I don't a reason why faiss will behave differently on running on a single process vs multiple processes.
Another note: please install faiss via conda as per the official guidance.
version v1.8.0 has the same problem(memory leak)
can you elaborate more on where you're seeing the memory leak inside of faiss? In the above example, I don't a reason why faiss will behave differently on running on a single process vs multiple processes.
Another note: please install faiss via conda as per the official guidance.
I observed this problem through k8s pod metrics(pod memory: mem.memused). I thought this was strange too, but when I lowered the faiss version back down(v1.7.4->1.6.1), the problem was solved. I'll try it with conda.
Another note: please install faiss via conda as per the official guidance..
conda install -c pytorch faiss-cpu=1.7.4
conda install mkl=2021
install version 1.7.4 via conda. it also has this problem @pankajsingh88
Hi @XIAO-FAN-5257 , I tried to repro with this unit test on the latest branch. The dataset get_dataset_2() is a utility in FAISS already.
def test_4002(self):
d = 512
nb = 1000
nt = 1500
nq = 200
for i in range(20):
(xt, xb, xq) = get_dataset_2(d, nt, nb, nq)
index = faiss.IndexFlatL2(d)
mem1 = faiss.get_mem_usage_kb()
index.add(xb)
index.add(xq)
mem2 = faiss.get_mem_usage_kb()
print("mem1: {}, mem2: {}".format(mem1, mem2))
Results:
mem1: 208252, mem2: 213700
mem1: 257488, mem2: 262144
mem1: 305584, mem2: 310456
mem1: 353888, mem2: 358756
mem1: 402288, mem2: 407064
mem1: 444252, mem2: 449096
mem1: 416744, mem2: 421380
mem1: 428912, mem2: 433660
mem1: 441116, mem2: 445952
mem1: 417136, mem2: 421768
mem1: 429308, mem2: 434052
mem1: 441496, mem2: 446336
mem1: 417512, mem2: 422148
mem1: 429684, mem2: 434428
mem1: 441872, mem2: 446704
mem1: 417880, mem2: 422508
mem1: 429960, mem2: 434700
mem1: 442060, mem2: 446888
mem1: 417972, mem2: 422596
mem1: 430036, mem2: 434772
Even though the process memory creeps up at the start, memory starts being freed and it never goes over 450 MB. Do you see something different where it grows forever? Maybe you can attach some figures of the memory increase that you observe and your dataset size?
This issue is stale because it has been open for 7 days with no activity.
This issue was closed because it has been inactive for 7 days since being marked as stale.