k-NN
k-NN copied to clipboard
Add Memory Evaluation For different algorithm
Description
When we want to introduce a new algorithm or engine, we prefer to evaluate the performance, memory, disk size. in benchmark tests, we can evaluate the performance as a single node.
But we can not evaluate a engine/algorithm takes how much memory, because in benchmark jvm make it hard to evaluate the memory only in jni layer.
in #946 we try to assess memory usage with different algorithm, and when using benchmark/intergration tests, java heap memory and other memory usage make it hard to evaluate algorithm real memory usage.
so i write a memory tests, and only use faiss_wrapper/nmslib_wrapper. it can evaluate memory usage, file size.
i use http://corpus-texmex.irisa.fr/ vector file format. and add test_util::load_data to read sift.fvecs with SIFT1M datasets.
i do some tests like following report:
SIFT1M
| Algotightm | Index RES | FileSize | Query RES |
|---|---|---|---|
| HNSW32 | 1.8GB | 634MB | 769MB |
| NSG64 | 3.1GB | 586MB | 752MB |
| NSG32 | 3.1GB | 577MB | 639MB |
GIST960
| Algotightm | Index RES | FileSize | Query RES |
|---|---|---|---|
| HNSW32 | 11.2GB | 3.9GB | 3944MB |
| NSG64 | 12GB | 3.7GB | 3923MB |
| NSG32 | 12GB | 3.7GB | 3806MB |
Usage:
-
go to http://corpus-texmex.irisa.fr/, and download SIFT1M dataset, and unzip into a directory like 'dataset/sift/sift_base.fvecs'
-
and run different tests
./bin/jni_memory_test --gtest_filter=FaissNSGQueryMemoryTest.*
Issues Resolved
#946
Check List
- [ ] New functionality includes testing.
- [ ] All tests pass
- [ ] New functionality has been documented.
- [ ] New functionality has javadoc added
- [ ] Commits are signed as per the DCO using --signoff
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.
@luyuncheng can you add details around what is Index RES and Query RES?
@luyuncheng can you add details around what is Index RES and Query RES?
@navneet1v Index Res: Build Graph Index takes a long time, i use a monitor to check the avg resident memory during the Build Index. Query Res: Query 1000 vector sequential, used a monitor to check the avg resident memory during the Query Index.
In general, I really like having the ability to use the jni_wrapper in order to test our code with real data sets (not just random data). This has a lot of potential to help us debug memory problems as well as performance problems.
That being said, I think that the memory monitoring should be done outside of the tests. Adding memory monitoring inside the test may make them difficult to work across platforms. I see the tests themselves more like JNI integration tests or end to end tests or microbenchmarks. We should remove all calls to get specific memory information from faiss and change the name from memory_test to integ_test or e2e_test or microbenchmarks. Instead, to check memory, I think that they can be run with an external monitor. For instance, I believe you used gperftools at some point.
@luyuncheng are you still working on this PR?