ann-benchmarks [bug report] about the mem

[bug report] about the mem_limit of dockers

Open ShawnShawnYou opened this issue 1 year ago • 1 comments

trafficstars

Hi! Team:

I've been trying to test several algorithms on the benchmark and used the following command:

python3 run.py --parallelism 31 --dataset gist-960-euclidean --runs 5 --force

I found that many algorithms failed and returned error 137. Upon checking the log, it showed that some algorithms were allocated less memory compared to others. The machine I used is the same as Erikbern's (i.e., an r6i.16xlarge machine on AWS, with 512GB of memory). Specifically, part of the log reads as follows:

Actually, we expect that each algorithm is limited to about 512 GB / 32 = 16 GB of memory. However, you can see that these algorithms are only allocated about 11 GB, which is far less than our expectation.

About Fix

Checking the code, I found that there is a bug when setting the mem_limit at Line 73 in ann_benchmarks/main.py:

mem_limit = int((psutil.virtual_memory().available - memory_margin) / args.parallelism)

When using "available," the algorithms in the first batch will get 16 GB of memory, while the algorithms in the latter batches will get less than 16 GB of memory.

So, I think this line should be modified to return the correct memory limit:

mem_limit = int((psutil.virtual_memory().total - memory_margin) / args.parallelism)

Or is it my misunderstanding about the setting of mem_limit? Thanks!

Environment a r6i.16xlarge machine on AWS

Oct 16 '24 08:10 ShawnShawnYou

ann-benchmarks ann-benchmarks copied to clipboard

[bug report] about the mem_limit of dockers

ann-benchmarks
ann-benchmarks copied to clipboard