k-NN icon indicating copy to clipboard operation
k-NN copied to clipboard

Add Memory Evaluation For different algorithm

Open luyuncheng opened this issue 2 years ago • 4 comments

Description

When we want to introduce a new algorithm or engine, we prefer to evaluate the performance, memory, disk size. in benchmark tests, we can evaluate the performance as a single node.

But we can not evaluate a engine/algorithm takes how much memory, because in benchmark jvm make it hard to evaluate the memory only in jni layer.

in #946 we try to assess memory usage with different algorithm, and when using benchmark/intergration tests, java heap memory and other memory usage make it hard to evaluate algorithm real memory usage.

so i write a memory tests, and only use faiss_wrapper/nmslib_wrapper. it can evaluate memory usage, file size. i use http://corpus-texmex.irisa.fr/ vector file format. and add test_util::load_data to read sift.fvecs with SIFT1M datasets.

i do some tests like following report:

SIFT1M

Algotightm Index RES FileSize Query RES
HNSW32 1.8GB 634MB 769MB
NSG64 3.1GB 586MB 752MB
NSG32 3.1GB 577MB 639MB

GIST960

Algotightm Index RES FileSize Query RES
HNSW32 11.2GB 3.9GB 3944MB
NSG64 12GB 3.7GB 3923MB
NSG32 12GB 3.7GB 3806MB

Usage:

  1. go to http://corpus-texmex.irisa.fr/, and download SIFT1M dataset, and unzip into a directory like 'dataset/sift/sift_base.fvecs'

  2. and run different tests

./bin/jni_memory_test --gtest_filter=FaissNSGQueryMemoryTest.*

Issues Resolved

#946

Check List

  • [ ] New functionality includes testing.
    • [ ] All tests pass
  • [ ] New functionality has been documented.
    • [ ] New functionality has javadoc added
  • [ ] Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

luyuncheng avatar Sep 15 '23 09:09 luyuncheng

@luyuncheng can you add details around what is Index RES and Query RES?

navneet1v avatar Sep 21 '23 05:09 navneet1v

@luyuncheng can you add details around what is Index RES and Query RES?

@navneet1v Index Res: Build Graph Index takes a long time, i use a monitor to check the avg resident memory during the Build Index. Query Res: Query 1000 vector sequential, used a monitor to check the avg resident memory during the Query Index.

luyuncheng avatar Sep 21 '23 07:09 luyuncheng

In general, I really like having the ability to use the jni_wrapper in order to test our code with real data sets (not just random data). This has a lot of potential to help us debug memory problems as well as performance problems.

That being said, I think that the memory monitoring should be done outside of the tests. Adding memory monitoring inside the test may make them difficult to work across platforms. I see the tests themselves more like JNI integration tests or end to end tests or microbenchmarks. We should remove all calls to get specific memory information from faiss and change the name from memory_test to integ_test or e2e_test or microbenchmarks. Instead, to check memory, I think that they can be run with an external monitor. For instance, I believe you used gperftools at some point.

jmazanec15 avatar Sep 26 '23 22:09 jmazanec15

@luyuncheng are you still working on this PR?

navneet1v avatar Jan 31 '24 06:01 navneet1v