SPTAG icon indicating copy to clipboard operation
SPTAG copied to clipboard

OverflowError in method 'AnnIndex_AddWithMetaData' when running test script

Open peidaqi opened this issue 5 years ago • 12 comments

Describe the bug OverflowError: in method 'AnnIndex_AddWithMetaData', argument 4 of type 'int' when running the Python test script from "Get Started"

To Reproduce Create a new Python file in the Release directory, copy&paste the Python test script from "Get Started".

Additional context My environment: Ubuntu 18.04, Kernel 4.15.0 x86_64 Tried swig 3.0 & 4.0 cmake 3.14.4 gcc 7.4.0 python 3.6

peidaqi avatar May 17 '19 17:05 peidaqi

I have the same error in OSX and python 3.7.3.

m0baxter avatar May 17 '19 18:05 m0baxter

This is a very strange error. I looked into the swig generated CoreInterface_pwrap.cpp, and everything seems to be fine.

peidaqi avatar May 17 '19 18:05 peidaqi

Also getting the same error. Here's some context including the stack trace

[1, 101, 3]
[10.0, 10.0, 90.0]
[1, 3, 101]
[10.0, 10.0, 10.0]
[3, 5, 103]
[10.0, 10.0, 10.0]
AddWithMetaData.............................
Setting NumberOfThreads with value 4
Setting DistCalcMethod with value L2
Traceback (most recent call last):
  File "/data/test.py", line 90, in <module>
    Test('BKT', 'L2')
  File "/data/test.py", line 83, in Test
    testAddWithMetaData(None, x, m, 'testindices', algo, distmethod)
  File "/data/test.py", line 56, in testAddWithMetaData
    if i.AddWithMetaData(x.tobytes(), s, x.shape[0]):
  File "/app/Release/SPTAG.py", line 143, in AddWithMetaData
    return _SPTAG.AnnIndex_AddWithMetaData(self, p_data, p_meta, p_num)
OverflowError: in method 'AnnIndex_AddWithMetaData', argument 4 of type 'int'

tekumara avatar May 19 '19 12:05 tekumara

I have the same issue with ubuntu 18.04, swig 4.0, cmake 3.14.4 and python 3.6.0. it also occurs when trying to build with docker as well

RFHO-BDSS avatar May 20 '19 09:05 RFHO-BDSS

def testBuildWithMetaData(algo, distmethod, x, s, out):
    i = SPTAG.AnnIndex(algo, 'Float', x.shape[1])
    i.SetBuildParam("NumberOfThreads", '4')
    i.SetBuildParam("DistCalcMethod", distmethod)
    shape = float(x.shape[0])
    print(type(shape))
    if i.BuildWithMetaData(x.tobytes(), s, shape):
        i.Save(out)

def Test(algo, distmethod):
    x = np.ones((n, 10), dtype=np.float32) * np.reshape(np.arange(n, dtype=np.float32), (n, 1))
    q = np.ones((r, 10), dtype=np.float32) * np.reshape(np.arange(r, dtype=np.float32), (r, 1)) * 2
    m = ''
    for i in range(n):
        m += str(i) + '\n'

    print ("Build with metadata.............................")
    testBuildWithMetaData(algo, distmethod, x, m, 'testindices')

if __name__ == '__main__':
    Test('BKT', 'L2')

returns the same error argument 4 of type 'int' as:

def testBuildWithMetaData(algo, distmethod, x, s, out):
    i = SPTAG.AnnIndex(algo, 'Float', x.shape[1])
    i.SetBuildParam("NumberOfThreads", '4')
    i.SetBuildParam("DistCalcMethod", distmethod)
    shape = int(x.shape[0])
    print(type(shape))
    if i.BuildWithMetaData(x.tobytes(), s, shape):
        i.Save(out)

def Test(algo, distmethod):
    x = np.ones((n, 10), dtype=np.float32) * np.reshape(np.arange(n, dtype=np.float32), (n, 1))
    q = np.ones((r, 10), dtype=np.float32) * np.reshape(np.arange(r, dtype=np.float32), (r, 1)) * 2
    m = ''
    for i in range(n):
        m += str(i) + '\n'

    print ("Build with metadata.............................")
    testBuildWithMetaData(algo, distmethod, x, m, 'testindices')

if __name__ == '__main__':
    Test('BKT', 'L2')

despite converting argument 4 to float in the first example

RFHO-BDSS avatar May 20 '19 09:05 RFHO-BDSS

Same here, running the docker version

SueSu-Wish avatar May 21 '19 07:05 SueSu-Wish

This issue only appears in python3. After adding m = m.encode() before BuildWithMetaData and AddWithMetaData, this error will disappear...

MaggieQi avatar May 21 '19 07:05 MaggieQi

@MaggieQi doesn't work in Python 3.6 - the interpretor reports problems with argument 4 which is the p_num. I doubt doing anything with the metadata, which is argument 3 would fix it - unless the error is related to argument 3 buffer overflowing and writing to argument 4....

I think the SPTAG developers team has made a very bad decision exposing the library using C++ classes. SWIG has limited support for C++ and this may result in other problems - not to mention using a C++ class reference as parameter (ByteArray)...

peidaqi avatar May 21 '19 14:05 peidaqi

The m.encode() trick seems to be working.

As to how the metadata is represented in the C++, there seem to be issues of overflows when the strings used are too long. I was getting seg faults all over the place with strings longer than a few hundred characters.

m0baxter avatar May 22 '19 14:05 m0baxter

Tried multiple times, still doesn't work for me. What's your enviornment?

I think it's better to change the C++ implementation of AddWithMetaData to take raw char* buffers. Swig might be confused with the different constructors of ByteArray.

peidaqi avatar May 22 '19 14:05 peidaqi

You are probably right about it being better to fix the underlying problem.

I am on

  • OSX 10.14.5
  • Python 3.7.3 (the one in brew)
  • cmake 3.14.3
  • gcc 9.1.0
  • swig 4.0.0
  • most recent master branch as of writing.

m0baxter avatar May 22 '19 14:05 m0baxter

It is a good suggestion! I will try to remove the ByteArray from the Wrapper part.

MaggieQi avatar May 23 '19 02:05 MaggieQi