k-NN Add Memory Evaluation For different algorithm

Description

When we want to introduce a new algorithm or engine, we prefer to evaluate the performance, memory, disk size. in benchmark tests, we can evaluate the performance as a single node.

But we can not evaluate a engine/algorithm takes how much memory, because in benchmark jvm make it hard to evaluate the memory only in jni layer.

in #946 we try to assess memory usage with different algorithm, and when using benchmark/intergration tests, java heap memory and other memory usage make it hard to evaluate algorithm real memory usage.

so i write a memory tests, and only use faiss_wrapper/nmslib_wrapper. it can evaluate memory usage, file size. i use http://corpus-texmex.irisa.fr/ vector file format. and add test_util::load_data to read sift.fvecs with SIFT1M datasets.

i do some tests like following report:

SIFT1M

Algotightm	Index RES	FileSize	Query RES
HNSW32	1.8GB	634MB	769MB
NSG64	3.1GB	586MB	752MB
NSG32	3.1GB	577MB	639MB

GIST960

Algotightm	Index RES	FileSize	Query RES
HNSW32	11.2GB	3.9GB	3944MB
NSG64	12GB	3.7GB	3923MB
NSG32	12GB	3.7GB	3806MB

Usage:

go to http://corpus-texmex.irisa.fr/, and download SIFT1M dataset, and unzip into a directory like 'dataset/sift/sift_base.fvecs'
and run different tests

./bin/jni_memory_test --gtest_filter=FaissNSGQueryMemoryTest.*

Issues Resolved

#946

Check List

[ ] New functionality includes testing.
- [ ] All tests pass
[ ] New functionality has been documented.
- [ ] New functionality has javadoc added
[ ] Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Sep 15 '23 09:09 luyuncheng

@luyuncheng can you add details around what is Index RES and Query RES?

Sep 21 '23 05:09 navneet1v

@luyuncheng can you add details around what is Index RES and Query RES?

@navneet1v Index Res: Build Graph Index takes a long time, i use a monitor to check the avg resident memory during the Build Index. Query Res: Query 1000 vector sequential, used a monitor to check the avg resident memory during the Query Index.

Sep 21 '23 07:09 luyuncheng

In general, I really like having the ability to use the jni_wrapper in order to test our code with real data sets (not just random data). This has a lot of potential to help us debug memory problems as well as performance problems.

That being said, I think that the memory monitoring should be done outside of the tests. Adding memory monitoring inside the test may make them difficult to work across platforms. I see the tests themselves more like JNI integration tests or end to end tests or microbenchmarks. We should remove all calls to get specific memory information from faiss and change the name from memory_test to integ_test or e2e_test or microbenchmarks. Instead, to check memory, I think that they can be run with an external monitor. For instance, I believe you used gperftools at some point.

Sep 26 '23 22:09 jmazanec15

@luyuncheng are you still working on this PR?

Jan 31 '24 06:01 navneet1v

k-NN k-NN copied to clipboard

Add Memory Evaluation For different algorithm

Description

Issues Resolved

Check List

k-NN
k-NN copied to clipboard