SPTAG
SPTAG copied to clipboard
When the number ofvectors too big, index build will fail to complete!
When there are too many vectors, such as 5 million (n = 1024×1024×5), index build will fail to complete. The program stops at this place, prompting: "Save Data To xxxxx\vectors.bin" And I noticed that the file size of vectors.bin reach to 300G! that is not normal,the The correct file size should be 10G.
The code that caused the error is as follows:
import SPTAG
import numpy as np
n = 1024*1024*5 #this szie will cause the file size of vectors.bin to reach 300G!
k = 3
r = 3
Dimension=512 #the size of vectors.bin will be 1024×1024×512×4=2G n×Dimension×4
def testBuild(algo, distmethod, x, out):
i = SPTAG.AnnIndex(algo, 'Float', x.shape[1])
i.SetBuildParam("NumberOfThreads", '4')
i.SetBuildParam("DistCalcMethod", distmethod)
ret = i.Build(x.tobytes(), x.shape[0])
i.Save(out)
def Test(algo, distmethod):
x = np.ones((n, Dimension), dtype=np.float32) * np.reshape(np.arange(n, dtype=np.float32), (n, 1))
q = np.ones((r, Dimension), dtype=np.float32) * np.reshape(np.arange(r, dtype=np.float32), (r, 1)) * 2
print ("Build.............................")
testBuild(algo, distmethod, x, 'testindices')
if __name__ == '__main__':
Test('BKT', 'L2')
How can I solve this error? Please Help Me!
set n = 1024×1024×3 also can cause this error, but n = 1024×1024×2 is ok. I dont know why? is there still has wrapper bug ?
I has test to build index with indexbuilder.exe last night, it seems that when the size of raw data large than 4G (eg 6G), the program will fail to complete , stop at "Save Data To xxxx\vectors.bin" . and I open the output directory , there were a vectors.bin which size reached to 300G !
The above phenomenon indicates that the problem is not caused by Python's package interface. But what is the real reason?
Note: In my test, the memory usage will reach 20G or more, and the establishment of the entire index takes a long time (up to several hours), so if you are not testing a very large amount of data, you will not encounter this problem.
This is a screenshot of the problem:
indexbuilder.exe -d 512 -v float -i BIN:vv.bin -o test1 -a BKT -m Cosine
I have tried your python code, and the index build can finish successfully. What is your runtime environment?
I am using SPTAG source code compiled with python3.6 64-bit VS2017. And I run the code under Windows 7 sp1, Python 3.6 64bit。
I have tried your python code, and the index build can finish successfully. What is your runtime environment?
The same problem will occur when running the above code under WINDOWS10 ,python3.6 64bit.
yep, It takes so long to complete the training index
I've tried on the experiment with the dataset (1024 * 1024 * 5, 128) -- 5.242.880 vectors of 128 dim. Env:
- Core [email protected], 8GRam.
- Python 2.7
-> do: Test('BKT', 'L2')
It took me around 8-9 hours to complete the training. @MaggieQi
yep, It takes so long to complete the training index
I've tried on the experiment with the dataset (1024 * 1024 * 5, 128) -- 5.242.880 vectors of 128 dim. Env:
- Core [email protected], 8GRam.
- Python 2.7
-> do: Test('BKT', 'L2')
It took me around 8-9 hours to complete the training. @MaggieQi
What‘s your OS version? Linux or Windows?
yep, It takes so long to complete the training index I've tried on the experiment with the dataset (1024 * 1024 * 5, 128) -- 5.242.880 vectors of 128 dim. Env:
- Core [email protected], 8GRam.
- Python 2.7
-> do: Test('BKT', 'L2') It took me around 8-9 hours to complete the training. @MaggieQi
What‘s your OS version? Linux or Windows?
Linux, Ubuntu 18.04 64bit
@deepxuexi I also tried your python code on Windows server with VS 2015 using the newest version of code. It can also finish successfully. Maybe you can pull the newest code and try it again.
@deepxuexi I also tried your python code on Windows server with VS 2015 using the newest version of code. It can also finish successfully. Maybe you can pull the newest code and try it again.
I compiled SPTAG again with windows7 +VS2015 +tbb4.4+cmake3.14.4+swigwin4.0. The same error occurred during the running process: When there are too many vectors, the saved vectors.bin file will reach 300G. Can you tell me your detailed compilation environment? Such as the version of the Windows operating system, VS2015 version, cmake, swig, TBB version?
@deepxuexi I also tried your python code on Windows server with VS 2015 using the newest version of code. It can also finish successfully. Maybe you can pull the newest code and try it again.
This is my compiled file, you can run TEST_ERROR.py directly in python3.6 to reproduce the error. SPTAG_PY36.zip I wonder if you can send me a copy of your compiled files (only those exe and pyd generated by SPTAG and your tbb.dll), thank you!
@deepxuexi my environment is Windows 10 (also tried it on Windows server 2016) + VS2015 + cmake 3.12.3 + swig 3.0.12 + boost 1.67.0. Which version of SPTAG code do you use? Do you use cmake to compiling the code or directly use the SPTAG.sln?
@deepxuexi my environment is Windows 10 (also tried it on Windows server 2016) + VS2015 + cmake 3.12.3 + swig 3.0.12 + boost 1.67.0. Which version of SPTAG code do you use? Do you use cmake to compiling the code or directly use the SPTAG.sln? I use the SPTAG code of 2019-5-20,and compling the code with "ALL_BUILD.vcxproj" in VS2015。
@deepxuexi my environment is Windows 10 (also tried it on Windows server 2016) + VS2015 + cmake 3.12.3 + swig 3.0.12 + boost 1.67.0. Which version of SPTAG code do you use? Do you use cmake to compiling the code or directly use the SPTAG.sln? I just tryed compling the code with VS2015 + cmake 3.12.3 + swig 3.0.12 + boost 1.67.0 and python3.6 ,but the problem remain. Could you tell me what's your TBB and python version? thank you very much!