ann-benchmarks icon indicating copy to clipboard operation
ann-benchmarks copied to clipboard

Python 3.7 and fresh checkout from master: dependency installation issue

Open DmitryKey opened this issue 4 years ago • 21 comments

Hello! I've tried installing dependencies under Python 3.7.10 and got the following output. What Python version is supported / recommended?

    Running setup.py install for numpy ... error
    ERROR: Command errored out with exit status 1:
     command: /Users/dmitrykan/project/ann-benchmarks/venv/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/2l/f04dpd917vx50cyl8fftcxzr0000gn/T/pip-install-l_rkp2wp/numpy_7ba5c87712f044d5af4cf340e9f6bc24/setup.py'"'"'; __file__='"'"'/private/var/folders/2l/f04dpd917vx50cyl8fftcxzr0000gn/T/pip-install-l_rkp2wp/numpy_7ba5c87712f044d5af4cf340e9f6bc24/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/2l/f04dpd917vx50cyl8fftcxzr0000gn/T/pip-record-wrmxmqjz/install-record.txt --single-version-externally-managed --compile --install-headers /Users/dmitrykan/project/ann-benchmarks/venv/include/site/python3.7/numpy
         cwd: /private/var/folders/2l/f04dpd917vx50cyl8fftcxzr0000gn/T/pip-install-l_rkp2wp/numpy_7ba5c87712f044d5af4cf340e9f6bc24/
    Complete output (195 lines):
    Running from numpy source directory.
    
    Note: if you need reliable uninstall behavior, then install
    with pip instead of using `setup.py install`:
    
      - `pip install .`       (from a git repo or downloaded source
                               release)
      - `pip install numpy`   (last NumPy release on PyPi)
    
    
    blas_opt_info:
    blas_mkl_info:
      libraries mkl_rt not found in ['/Users/dmitrykan/project/ann-benchmarks/venv/lib', '/usr/local/lib', '/usr/lib']
      NOT AVAILABLE
    
    blis_info:
      libraries blis not found in ['/Users/dmitrykan/project/ann-benchmarks/venv/lib', '/usr/local/lib', '/usr/lib']
      NOT AVAILABLE
    
    openblas_info:
      libraries openblas not found in ['/Users/dmitrykan/project/ann-benchmarks/venv/lib', '/usr/local/lib', '/usr/lib']
      NOT AVAILABLE
    
    atlas_3_10_blas_threads_info:
    Setting PTATLAS=ATLAS
      libraries tatlas not found in ['/Users/dmitrykan/project/ann-benchmarks/venv/lib', '/usr/local/lib', '/usr/lib']
      NOT AVAILABLE
    
    atlas_3_10_blas_info:
      libraries satlas not found in ['/Users/dmitrykan/project/ann-benchmarks/venv/lib', '/usr/local/lib', '/usr/lib']
      NOT AVAILABLE
    
    atlas_blas_threads_info:
    Setting PTATLAS=ATLAS
      libraries ptf77blas,ptcblas,atlas not found in ['/Users/dmitrykan/project/ann-benchmarks/venv/lib', '/usr/local/lib', '/usr/lib']
      NOT AVAILABLE
    
    atlas_blas_info:
      libraries f77blas,cblas,atlas not found in ['/Users/dmitrykan/project/ann-benchmarks/venv/lib', '/usr/local/lib', '/usr/lib']
      NOT AVAILABLE
    
      FOUND:
        extra_compile_args = ['-msse3', '-I/System/Library/Frameworks/vecLib.framework/Headers']
        extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
        define_macros = [('NO_ATLAS_INFO', 3), ('HAVE_CBLAS', None)]
    
    /bin/sh: svnversion: command not found
    non-existing path in 'numpy/distutils': 'site.cfg'
    /bin/sh: svnversion: command not found
    F2PY Version 2
    lapack_opt_info:
    lapack_mkl_info:
      libraries mkl_rt not found in ['/Users/dmitrykan/project/ann-benchmarks/venv/lib', '/usr/local/lib', '/usr/lib']
      NOT AVAILABLE
    
    openblas_lapack_info:
      libraries openblas not found in ['/Users/dmitrykan/project/ann-benchmarks/venv/lib', '/usr/local/lib', '/usr/lib']
      NOT AVAILABLE
    
    atlas_3_10_threads_info:
    Setting PTATLAS=ATLAS
      libraries tatlas,tatlas not found in /Users/dmitrykan/project/ann-benchmarks/venv/lib
      libraries lapack_atlas not found in /Users/dmitrykan/project/ann-benchmarks/venv/lib
      libraries tatlas,tatlas not found in /usr/local/lib
      libraries lapack_atlas not found in /usr/local/lib
      libraries tatlas,tatlas not found in /usr/lib
      libraries lapack_atlas not found in /usr/lib
    <class 'numpy.distutils.system_info.atlas_3_10_threads_info'>
      NOT AVAILABLE
    
    atlas_3_10_info:
      libraries satlas,satlas not found in /Users/dmitrykan/project/ann-benchmarks/venv/lib
      libraries lapack_atlas not found in /Users/dmitrykan/project/ann-benchmarks/venv/lib
      libraries satlas,satlas not found in /usr/local/lib
      libraries lapack_atlas not found in /usr/local/lib
      libraries satlas,satlas not found in /usr/lib
      libraries lapack_atlas not found in /usr/lib
    <class 'numpy.distutils.system_info.atlas_3_10_info'>
      NOT AVAILABLE
    
    atlas_threads_info:
    Setting PTATLAS=ATLAS
      libraries ptf77blas,ptcblas,atlas not found in /Users/dmitrykan/project/ann-benchmarks/venv/lib
      libraries lapack_atlas not found in /Users/dmitrykan/project/ann-benchmarks/venv/lib
      libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib
      libraries lapack_atlas not found in /usr/local/lib
      libraries ptf77blas,ptcblas,atlas not found in /usr/lib
      libraries lapack_atlas not found in /usr/lib
    <class 'numpy.distutils.system_info.atlas_threads_info'>
      NOT AVAILABLE
    
    atlas_info:
      libraries f77blas,cblas,atlas not found in /Users/dmitrykan/project/ann-benchmarks/venv/lib
      libraries lapack_atlas not found in /Users/dmitrykan/project/ann-benchmarks/venv/lib
      libraries f77blas,cblas,atlas not found in /usr/local/lib
      libraries lapack_atlas not found in /usr/local/lib
      libraries f77blas,cblas,atlas not found in /usr/lib
      libraries lapack_atlas not found in /usr/lib
    <class 'numpy.distutils.system_info.atlas_info'>
      NOT AVAILABLE
    
      FOUND:
        extra_compile_args = ['-msse3']
        extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
        define_macros = [('NO_ATLAS_INFO', 3), ('HAVE_CBLAS', None)]
    
    /usr/local/Cellar/[email protected]/3.7.10_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/distutils/dist.py:274: UserWarning: Unknown distribution option: 'define_macros'
      warnings.warn(msg)
    running install
    running build
    running config_cc
    unifing config_cc, config, build_clib, build_ext, build commands --compiler options
    running config_fc
    unifing config_fc, config, build_clib, build_ext, build commands --fcompiler options
    running build_src
    build_src
    building py_modules sources
    creating build
    creating build/src.macosx-11-x86_64-3.7
    creating build/src.macosx-11-x86_64-3.7/numpy
    creating build/src.macosx-11-x86_64-3.7/numpy/distutils
    building library "npymath" sources
    customize Gnu95FCompiler
    Found executable /usr/local/bin/gfortran
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/var/folders/2l/f04dpd917vx50cyl8fftcxzr0000gn/T/pip-install-l_rkp2wp/numpy_7ba5c87712f044d5af4cf340e9f6bc24/setup.py", line 392, in <module>
        setup_package()
      File "/private/var/folders/2l/f04dpd917vx50cyl8fftcxzr0000gn/T/pip-install-l_rkp2wp/numpy_7ba5c87712f044d5af4cf340e9f6bc24/setup.py", line 384, in setup_package
        setup(**metadata)
      File "/private/var/folders/2l/f04dpd917vx50cyl8fftcxzr0000gn/T/pip-install-l_rkp2wp/numpy_7ba5c87712f044d5af4cf340e9f6bc24/numpy/distutils/core.py", line 169, in setup
        return old_setup(**new_attr)
      File "/Users/dmitrykan/project/ann-benchmarks/venv/lib/python3.7/site-packages/setuptools/__init__.py", line 153, in setup
        return distutils.core.setup(**attrs)
      File "/usr/local/Cellar/[email protected]/3.7.10_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/distutils/core.py", line 148, in setup
        dist.run_commands()
      File "/usr/local/Cellar/[email protected]/3.7.10_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/distutils/dist.py", line 966, in run_commands
        self.run_command(cmd)
      File "/usr/local/Cellar/[email protected]/3.7.10_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "/private/var/folders/2l/f04dpd917vx50cyl8fftcxzr0000gn/T/pip-install-l_rkp2wp/numpy_7ba5c87712f044d5af4cf340e9f6bc24/numpy/distutils/command/install.py", line 62, in run
        r = self.setuptools_run()
      File "/private/var/folders/2l/f04dpd917vx50cyl8fftcxzr0000gn/T/pip-install-l_rkp2wp/numpy_7ba5c87712f044d5af4cf340e9f6bc24/numpy/distutils/command/install.py", line 36, in setuptools_run
        return distutils_install.run(self)
      File "/usr/local/Cellar/[email protected]/3.7.10_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/distutils/command/install.py", line 545, in run
        self.run_command('build')
      File "/usr/local/Cellar/[email protected]/3.7.10_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/distutils/cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "/usr/local/Cellar/[email protected]/3.7.10_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "/private/var/folders/2l/f04dpd917vx50cyl8fftcxzr0000gn/T/pip-install-l_rkp2wp/numpy_7ba5c87712f044d5af4cf340e9f6bc24/numpy/distutils/command/build.py", line 47, in run
        old_build.run(self)
      File "/usr/local/Cellar/[email protected]/3.7.10_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/distutils/command/build.py", line 135, in run
        self.run_command(cmd_name)
      File "/usr/local/Cellar/[email protected]/3.7.10_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/distutils/cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "/usr/local/Cellar/[email protected]/3.7.10_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "/private/var/folders/2l/f04dpd917vx50cyl8fftcxzr0000gn/T/pip-install-l_rkp2wp/numpy_7ba5c87712f044d5af4cf340e9f6bc24/numpy/distutils/command/build_src.py", line 148, in run
        self.build_sources()
      File "/private/var/folders/2l/f04dpd917vx50cyl8fftcxzr0000gn/T/pip-install-l_rkp2wp/numpy_7ba5c87712f044d5af4cf340e9f6bc24/numpy/distutils/command/build_src.py", line 159, in build_sources
        self.build_library_sources(*libname_info)
      File "/private/var/folders/2l/f04dpd917vx50cyl8fftcxzr0000gn/T/pip-install-l_rkp2wp/numpy_7ba5c87712f044d5af4cf340e9f6bc24/numpy/distutils/command/build_src.py", line 294, in build_library_sources
        sources = self.generate_sources(sources, (lib_name, build_info))
      File "/private/var/folders/2l/f04dpd917vx50cyl8fftcxzr0000gn/T/pip-install-l_rkp2wp/numpy_7ba5c87712f044d5af4cf340e9f6bc24/numpy/distutils/command/build_src.py", line 377, in generate_sources
        source = func(extension, build_dir)
      File "numpy/core/setup.py", line 672, in get_mathlib_info
        st = config_cmd.try_link('int main(void) { return 0;}')
      File "/usr/local/Cellar/[email protected]/3.7.10_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/distutils/command/config.py", line 243, in try_link
        self._check_compiler()
      File "/private/var/folders/2l/f04dpd917vx50cyl8fftcxzr0000gn/T/pip-install-l_rkp2wp/numpy_7ba5c87712f044d5af4cf340e9f6bc24/numpy/distutils/command/config.py", line 81, in _check_compiler
        c_compiler=self.compiler)
      File "/private/var/folders/2l/f04dpd917vx50cyl8fftcxzr0000gn/T/pip-install-l_rkp2wp/numpy_7ba5c87712f044d5af4cf340e9f6bc24/numpy/distutils/fcompiler/__init__.py", line 842, in new_fcompiler
        c_compiler=c_compiler)
      File "/private/var/folders/2l/f04dpd917vx50cyl8fftcxzr0000gn/T/pip-install-l_rkp2wp/numpy_7ba5c87712f044d5af4cf340e9f6bc24/numpy/distutils/fcompiler/__init__.py", line 816, in get_default_fcompiler
        c_compiler=c_compiler)
      File "/private/var/folders/2l/f04dpd917vx50cyl8fftcxzr0000gn/T/pip-install-l_rkp2wp/numpy_7ba5c87712f044d5af4cf340e9f6bc24/numpy/distutils/fcompiler/__init__.py", line 765, in _find_existing_fcompiler
        c.customize(dist)
      File "/private/var/folders/2l/f04dpd917vx50cyl8fftcxzr0000gn/T/pip-install-l_rkp2wp/numpy_7ba5c87712f044d5af4cf340e9f6bc24/numpy/distutils/fcompiler/__init__.py", line 521, in customize
        linker_so_flags = self.flag_vars.linker_so
      File "/private/var/folders/2l/f04dpd917vx50cyl8fftcxzr0000gn/T/pip-install-l_rkp2wp/numpy_7ba5c87712f044d5af4cf340e9f6bc24/numpy/distutils/environment.py", line 39, in __getattr__
        return self._get_var(name, conf_desc)
      File "/private/var/folders/2l/f04dpd917vx50cyl8fftcxzr0000gn/T/pip-install-l_rkp2wp/numpy_7ba5c87712f044d5af4cf340e9f6bc24/numpy/distutils/environment.py", line 53, in _get_var
        var = self._hook_handler(name, hook)
      File "/private/var/folders/2l/f04dpd917vx50cyl8fftcxzr0000gn/T/pip-install-l_rkp2wp/numpy_7ba5c87712f044d5af4cf340e9f6bc24/numpy/distutils/fcompiler/__init__.py", line 700, in _environment_hook
        return hook()
      File "/private/var/folders/2l/f04dpd917vx50cyl8fftcxzr0000gn/T/pip-install-l_rkp2wp/numpy_7ba5c87712f044d5af4cf340e9f6bc24/numpy/distutils/fcompiler/gnu.py", line 309, in get_flags_linker_so
        flags = GnuFCompiler.get_flags_linker_so(self)
      File "/private/var/folders/2l/f04dpd917vx50cyl8fftcxzr0000gn/T/pip-install-l_rkp2wp/numpy_7ba5c87712f044d5af4cf340e9f6bc24/numpy/distutils/fcompiler/gnu.py", line 138, in get_flags_linker_so
        os.environ['MACOSX_DEPLOYMENT_TARGET'] = target
      File "/usr/local/Cellar/[email protected]/3.7.10_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/os.py", line 686, in __setitem__
        value = self.encodevalue(value)
      File "/usr/local/Cellar/[email protected]/3.7.10_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/os.py", line 756, in encode
        raise TypeError("str expected, not %s" % type(value).__name__)
    TypeError: str expected, not int
    ----------------------------------------
ERROR: Command errored out with exit status 1: /Users/dmitrykan/project/ann-benchmarks/venv/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/2l/f04dpd917vx50cyl8fftcxzr0000gn/T/pip-install-l_rkp2wp/numpy_7ba5c87712f044d5af4cf340e9f6bc24/setup.py'"'"'; __file__='"'"'/private/var/folders/2l/f04dpd917vx50cyl8fftcxzr0000gn/T/pip-install-l_rkp2wp/numpy_7ba5c87712f044d5af4cf340e9f6bc24/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/2l/f04dpd917vx50cyl8fftcxzr0000gn/T/pip-record-wrmxmqjz/install-record.txt --single-version-externally-managed --compile --install-headers /Users/dmitrykan/project/ann-benchmarks/venv/include/site/python3.7/numpy Check the logs for full command output.

DmitryKey avatar Jun 05 '21 17:06 DmitryKey

Python 3.6 is required for installing the current requirements.

On most setups I've tried with more recent versions of Python, just removing the pinned versions of the libraries worked fine.

maumueller avatar Jun 07 '21 10:06 maumueller

The installation problems above seem to have to do with Numpy, not ann-benchmarks

Either way, it would be good to bump the Python version to 3.8 or ideally 3.9. I think 3.6 is ancient at this point.

erikbern avatar Jun 07 '21 13:06 erikbern

thanks so much for your responses! I'll check lower Python version and report here.

DmitryKey avatar Jun 10 '21 13:06 DmitryKey

Hi there, I'm having the a similar issue. However, I tried Python 3.6.13 and h5py requires 3.7+:

Collecting h5py==2.7.1 (from -r requirements.txt (line 3))
  Using cached https://files.pythonhosted.org/packages/41/7a/6048de44c62fc5e618178ef9888850c3773a9e4be249e5e673ebce0402ff/h5py-2.7.1.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "/Users/nbrempel/.pyenv/versions/3.6.13/lib/python3.6/site-packages/setuptools/sandbox.py", line 154, in save_modules
        yield saved
      File "/Users/nbrempel/.pyenv/versions/3.6.13/lib/python3.6/site-packages/setuptools/sandbox.py", line 195, in setup_context
        yield
      File "/Users/nbrempel/.pyenv/versions/3.6.13/lib/python3.6/site-packages/setuptools/sandbox.py", line 250, in run_setup
        _execfile(setup_script, ns)
      File "/Users/nbrempel/.pyenv/versions/3.6.13/lib/python3.6/site-packages/setuptools/sandbox.py", line 45, in _execfile
        exec(code, globals, locals)
      File "/var/folders/cv/4kr757hn62gb0j3vdcm58jx40000gp/T/easy_install-z7ol4p_z/numpy-1.21.0/setup.py", line 34, in <module>
        # RUN_REQUIRES can be removed when setup.py test is removed
    RuntimeError: Python version >= 3.7 required.

(3.7 fails with other errors)

nrempel avatar Jul 05 '21 22:07 nrempel

In my case, wheel was not available in my pyenv environment. Running pyenv exec pip install --upgrade pip setuptools wheel solved my problem.

nrempel avatar Jul 05 '21 23:07 nrempel

took a while to come back to this -- @nrempel thanks for sharing a recipe that worked for you. I've just tried to bootstrap the project with 3.7 and got this in PyCharm:

Screenshot 2021-07-08 at 20 58 59

the screenshot looks a bit strange, because matplotlib==2.1.0.

Above that message there is additional info for how to possibly proceed on this:

   * The following required packages can not be built:                            
   * freetype, png 
   * Try installing freetype with `brew install freetype` 
   * Try installing png with `brew install libpng`

DmitryKey avatar Jul 08 '21 18:07 DmitryKey

Not sure if we need to pin the matplotlib version, feel free to use the latest and see if it works!

erikbern avatar Jul 08 '21 21:07 erikbern

I tried to upgrade to all latest versions with Python 3.7. The following library versions installed correctly:

ansicolors==1.1.8
docker==2.6.1
h5py==3.3.0
matplotlib==3.4.2
numpy==1.21.0
pyyaml==5.4
psutil==5.6.6
scipy==1.7.0
scikit-learn==0.24.2
jinja2==2.10

Going to verify by building docker image and running it next.

DmitryKey avatar Jul 09 '21 14:07 DmitryKey

With the versions above, I got these distribution of success/fail:

Install Status:
{'vespa': 'fail'}
{'elastiknn': 'fail'}
{'n2': 'success'}
{'flann': 'fail'}
{'pynndescent': 'fail'}
{'puffinn': 'success'}
{'annoy': 'success'}
{'hnswlib': 'success'}
{'scann': 'fail'}
{'nearpy': 'success'}
{'diskann_pq': 'fail'}
{'diskann': 'fail'}
{'opendistroknn': 'fail'}
{'faiss': 'success'}
{'sklearn': 'success'}
{'nmslib': 'success'}
{'elasticsearch': 'fail'}
{'rpforest': 'success'}
{'datasketch': 'success'}
{'kgraph': 'success'}
{'mih': 'success'}
{'milvus': 'success'}
{'dolphinn': 'success'}
{'sptag': 'success'}
{'mrpt': 'success'}
{'ngt': 'success'}

need to investigate further.

DmitryKey avatar Jul 09 '21 20:07 DmitryKey

Just wanted to log things as I go -- sorry if this is the wrong thread (figured, I'd keep all in one place to avoid creating multiple tickets):

python run.py --algorithm kgraph

leads to:

2021-07-10 13:38:40,382 - annb - INFO - Order: [Definition(algorithm='kgraph', constructor='KGraph', module='ann_benchmarks.algorithms.kgraph', docker_tag='ann-benchmarks-kgraph', arguments=['angular', {'reverse': -1, 'K': 200, 'L': 300, 'S': 20}, False], query_argument_groups=[[1], [2], [3], [4], [5], [10], [20], [30], [40], [50], [60], [70], [80], [90], [100]], disabled=False)]
2021-07-10 13:38:42,486 - annb.2c5441a317 - INFO - Created container 2c5441a317: CPU limit 1, mem limit 5444025088, timeout 7200, command ['--dataset', 'glove-100-angular', '--algorithm', 'kgraph', '--module', 'ann_benchmarks.algorithms.kgraph', '--constructor', 'KGraph', '--runs', '5', '--count', '10', '["angular", {"reverse": -1, "K": 200, "L": 300, "S": 20}, false]', '[1]', '[2]', '[3]', '[4]', '[5]', '[10]', '[20]', '[30]', '[40]', '[50]', '[60]', '[70]', '[80]', '[90]', '[100]']
2021-07-10 13:38:58,253 - annb.2c5441a317 - INFO - Generating control...
2021-07-10 13:39:03,596 - annb.2c5441a317 - INFO - Initializing...
2021-07-10 13:39:08,414 - annb.2c5441a317 - ERROR - Generating control...
Initializing...

2021-07-10 13:39:08,416 - annb.2c5441a317 - ERROR - Child process for container 2c5441a317 raised exception 137

Is it possible to add more colour to exception 137?

DmitryKey avatar Jul 10 '21 10:07 DmitryKey

@DmitryKey

The docker containers are still going to use python 3.6 if you didn't update the Dockerfile in https://github.com/erikbern/ann-benchmarks/blob/master/install/Dockerfile by using a more recent ubuntu release. You could use the old requirements.txt inside the docker containers (https://github.com/erikbern/ann-benchmarks/blob/master/install/Dockerfile#L8-L9) and another file locally to test whether this is the problem.

maumueller avatar Jul 10 '21 16:07 maumueller

thanks @maumueller ! I've upgraded the common Dockerfile to ubuntu 20.04 and adjusted the python installation instructions:

-FROM ubuntu:18.04
+FROM ubuntu:20.04
 
 RUN apt-get update
-RUN apt-get install -y python3-numpy python3-scipy python3-pip build-essential git
+RUN apt-get install python3.7
+RUN DEBIAN_FRONTEND="noninteractive" apt-get -y install python3-numpy python3-scipy python3-pip build-essential git
 RUN pip3 install -U pip

next, I had to modify the sptag's Dockerfile:

RUN apt-get update && DEBIAN_FRONTEND="noninteractive" apt-get -y install wget build-essential libtbb-dev software-properties-common swig

Running the algorithm with python run.py --algorithm sptag begins normally, but after a while I'm getting:

2021-07-11 16:13:27,517 - annb.3ed2396351 - INFO - [4] Hash table is full! Set HashTableExponent to larger value (default is 2). NewHashTableExponent=3 NewPoolSize=131071
2021-07-11 16:13:29,093 - annb.3ed2396351 - INFO - [4] Hash table is full! Set HashTableExponent to larger value (default is 2). NewHashTableExponent=3 NewPoolSize=131071
2021-07-11 16:13:29,395 - annb.3ed2396351 - INFO - [4] Hash table is full! Set HashTableExponent to larger value (default is 2). NewHashTableExponent=3 NewPoolSize=131071
2021-07-11 16:13:43,022 - annb.3ed2396351 - INFO - [4] Hash table is full! Set HashTableExponent to larger value (default is 2). NewHashTableExponent=3 NewPoolSize=131071
2021-07-11 16:14:00,279 - annb.3ed2396351 - INFO - [4] Hash table is full! Set HashTableExponent to larger value (default is 2). NewHashTableExponent=3 NewPoolSize=131071






2021-07-11 18:14:07,082 - annb.3ed2396351 - ERROR - Container.wait for container 3ed2396351 failed with exception
Traceback (most recent call last):
  File "/Users/dmitrykan/search/vs/ann-benchmarks/venv/lib/python3.7/site-packages/urllib3/response.py", line 438, in _error_catcher
    yield
  File "/Users/dmitrykan/search/vs/ann-benchmarks/venv/lib/python3.7/site-packages/urllib3/response.py", line 764, in read_chunked
    self._update_chunk_length()
  File "/Users/dmitrykan/search/vs/ann-benchmarks/venv/lib/python3.7/site-packages/urllib3/response.py", line 694, in _update_chunk_length
    line = self._fp.fp.readline()
  File "/usr/local/Cellar/[email protected]/3.7.11/Frameworks/Python.framework/Versions/3.7/lib/python3.7/socket.py", line 589, in readinto
    return self._sock.recv_into(b)
socket.timeout: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/dmitrykan/search/vs/ann-benchmarks/venv/lib/python3.7/site-packages/requests/models.py", line 753, in generate
    for chunk in self.raw.stream(chunk_size, decode_content=True):
  File "/Users/dmitrykan/search/vs/ann-benchmarks/venv/lib/python3.7/site-packages/urllib3/response.py", line 572, in stream
    for line in self.read_chunked(amt, decode_content=decode_content):
  File "/Users/dmitrykan/search/vs/ann-benchmarks/venv/lib/python3.7/site-packages/urllib3/response.py", line 793, in read_chunked
    self._original_response.close()
  File "/usr/local/Cellar/[email protected]/3.7.11/Frameworks/Python.framework/Versions/3.7/lib/python3.7/contextlib.py", line 130, in __exit__
    self.gen.throw(type, value, traceback)
  File "/Users/dmitrykan/search/vs/ann-benchmarks/venv/lib/python3.7/site-packages/urllib3/response.py", line 443, in _error_catcher
    raise ReadTimeoutError(self._pool, None, "Read timed out.")
urllib3.exceptions.ReadTimeoutError: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/dmitrykan/search/vs/ann-benchmarks/ann_benchmarks/runner.py", line 258, in run_docker
    exit_code = container.wait(timeout=timeout)
  File "/Users/dmitrykan/search/vs/ann-benchmarks/venv/lib/python3.7/site-packages/docker/models/containers.py", line 441, in wait
    return self.client.api.wait(self.id, **kwargs)
  File "/Users/dmitrykan/search/vs/ann-benchmarks/venv/lib/python3.7/site-packages/docker/utils/decorators.py", line 19, in wrapped
    return f(self, resource_id, *args, **kwargs)
  File "/Users/dmitrykan/search/vs/ann-benchmarks/venv/lib/python3.7/site-packages/docker/api/container.py", line 1257, in wait
    res = self._post(url, timeout=timeout)
  File "/Users/dmitrykan/search/vs/ann-benchmarks/venv/lib/python3.7/site-packages/docker/utils/decorators.py", line 46, in inner
    return f(self, *args, **kwargs)
  File "/Users/dmitrykan/search/vs/ann-benchmarks/venv/lib/python3.7/site-packages/docker/api/client.py", line 187, in _post
    return self.post(url, **self._set_request_timeout(kwargs))
  File "/Users/dmitrykan/search/vs/ann-benchmarks/venv/lib/python3.7/site-packages/requests/sessions.py", line 590, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "/Users/dmitrykan/search/vs/ann-benchmarks/venv/lib/python3.7/site-packages/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/Users/dmitrykan/search/vs/ann-benchmarks/venv/lib/python3.7/site-packages/requests/sessions.py", line 697, in send
    r.content
  File "/Users/dmitrykan/search/vs/ann-benchmarks/venv/lib/python3.7/site-packages/requests/models.py", line 831, in content
    self._content = b''.join(self.iter_content(CONTENT_CHUNK_SIZE)) or b''
  File "/Users/dmitrykan/search/vs/ann-benchmarks/venv/lib/python3.7/site-packages/requests/models.py", line 760, in generate
    raise ConnectionError(e)
requests.exceptions.ConnectionError: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out.

if you have ideas, what to check, please let me know.

Pasting the running command in full, just in case:

(venv) dmitrykan@Dmitrys-MacBook-Pro ann-benchmarks % python run.py --algorithm sptag
2021-07-11 15:44:58,905 - annb - INFO - running only sptag
2021-07-11 15:44:59,851 - annb - INFO - Order: [Definition(algorithm='sptag', constructor='Sptag', module='ann_benchmarks.algorithms.sptag', docker_tag='ann-benchmarks-sptag', arguments=['angular', 'KDT'], query_argument_groups=[[100], [200], [400], [1000], [2000], [4000]], disabled=False), Definition(algorithm='sptag', constructor='Sptag', module='ann_benchmarks.algorithms.sptag', docker_tag='ann-benchmarks-sptag', arguments=['angular', 'BKT'], query_argument_groups=[[100], [200], [400], [1000], [2000], [4000]], disabled=False)]
2021-07-11 15:45:00,773 - annb.3ed2396351 - INFO - Created container 3ed2396351: CPU limit 1, mem limit 6130084608, timeout 7200, command ['--dataset', 'glove-100-angular', '--algorithm', 'sptag', '--module', 'ann_benchmarks.algorithms.sptag', '--constructor', 'Sptag', '--runs', '5', '--count', '10', '["angular", "KDT"]', '[100]', '[200]', '[400]', '[1000]', '[2000]', '[4000]']

DmitryKey avatar Jul 11 '21 15:07 DmitryKey

Some of the algorithms will time out, that's normal.

I also wouldn't expect all the images to build properly. There's always some issues with a few of them.

if you get it working with Ubuntu 20.04 and Python 3.7 (or higher), I would love it if you can submit a pull request.

erikbern avatar Jul 13 '21 15:07 erikbern

@erikbern thanks! Actually I'm thinking it could be better to always use a specific release of each algorithm (where available) -- have you considered this? For instance, I was just trying to compile faiss and can see that it tries to pull certain resource from Python 3.8:

 > [8/8] RUN python3 -c 'import faiss; print(faiss.IndexFlatL2)':                                                                                                                                        
#10 0.711 Traceback (most recent call last):                                                                                                                                                             
#10 0.711   File "<string>", line 1, in <module>                                                                                                                                                         
#10 0.711   File "<frozen zipimport>", line 259, in load_module                                                                                                                                          
#10 0.711   File "/usr/local/lib/python3.8/dist-packages/faiss-1.7.1-py3.8.egg/faiss/__init__.py", line 18, in <module>
#10 0.711   File "<frozen zipimport>", line 259, in load_module
#10 0.711   File "/usr/local/lib/python3.8/dist-packages/faiss-1.7.1-py3.8.egg/faiss/loader.py", line 65, in <module>
#10 0.711   File "<frozen zipimport>", line 259, in load_module
#10 0.711   File "/usr/local/lib/python3.8/dist-packages/faiss-1.7.1-py3.8.egg/faiss/swigfaiss.py", line 13, in <module>
#10 0.711 ImportError: cannot import name '_swigfaiss' from 'faiss' (/usr/local/lib/python3.8/dist-packages/faiss-1.7.1-py3.8.egg/faiss/__init__.py)
------
executor failed running [/bin/sh -c python3 -c 'import faiss; print(faiss.IndexFlatL2)']: exit code: 1

this might be originating from ubuntu 20.04 itself, but it just occurred to me that having a reproducible "compilability" would be a big boost to usability.

DmitryKey avatar Jul 13 '21 19:07 DmitryKey

I fixed versions for the reproducibility setup for https://arxiv.org/abs/1807.05614 (e.g., https://github.com/maumueller/ann-benchmarks-reproducibility/blob/master/install/Dockerfile.ngt#L7) but I find it hard to imagine that developers will update these versions. Using the most recent version (with a chance of failing) seems more robust in terms of presenting up-to-date results.

maumueller avatar Jul 13 '21 19:07 maumueller

I'm torn about it – the benefit of pinning versions is that things will be more stable, but the drawback is that we'll use outdated versions during benchmarks. I think to some extent the onus over time could be on the library developers to make sure the latest version builds and runs correctly (eg the FAISS developers seem quite eager to push updates to ann-benchmarks) but I'm not sure if the "market power" is quite there for this to work more generally.

erikbern avatar Jul 13 '21 20:07 erikbern

@maumueller I agree -- may be not the developers of the specific algorithm, but developers of ann-benchmarks could have the versions fixed -- and I see you did that in Milvus's case -- I've had issues compiling it with Python 3.7, but it worked with Python 3.6. Here is the full list of algos that compiled (some of them still failed, like diskann):

{'vespa': 'success'}
{'elastiknn': 'fail'}
{'n2': 'success'}
{'flann': 'fail'}
{'pynndescent': 'success'}
{'puffinn': 'success'}
{'annoy': 'success'}
{'hnswlib': 'success'}
{'scann': 'fail'}
{'nearpy': 'success'}
{'diskann_pq': 'fail'}
{'diskann': 'fail'}
{'opendistroknn': 'fail'}
{'faiss': 'success'}
{'sklearn': 'success'}
{'nmslib': 'success'}
{'elasticsearch': 'fail'}
{'rpforest': 'success'}
{'datasketch': 'success'}
{'kgraph': 'success'}
{'mih': 'success'}
{'milvus': 'success'}
{'dolphinn': 'success'}
{'sptag': 'success'}
{'mrpt': 'success'}
{'ngt': 'success'}

I'm conversing with Milvus developers on fixing the issue with compiling their latest v1.1.1 release. If this is successful, will submit a PR.

DmitryKey avatar Jul 19 '21 09:07 DmitryKey

@erikbern yes, you are right. It is quite a task to ask the whole ANN community to timely push updates. However, I was impressed to see that this repo is cited in google research github. Great job!

DmitryKey avatar Jul 19 '21 09:07 DmitryKey

Hey guys, I wanted to ask for your advice: not sure if something is misconfigured on my side, but some of the algorithms run into timeout issue.

Here is one example:

2021-07-23 15:58:24,017 - annb.17c6b01171 - INFO - Created container 17c6b01171: CPU limit 1, mem limit 9590553344, timeout 7200, command ['--dataset', 'glove-100-angular', '--algorithm', 'sptag', '--module', 'ann_benchmarks.algorithms.sptag', '--constructor', 'Sptag', '--runs', '5', '--count', '10', '["angular", "KDT"]', '[100]', '[200]', '[400]', '[1000]', '[2000]', '[4000]']

So the SPTag runs for a few hours and the last several lines are:

    2021-07-23 17:21:15,385 - annb.17c6b01171 - INFO - [4] Hash table is full! Set HashTableExponent to larger value (default is 2). NewHashTableExponent=3 NewPoolSize=131071
    2021-07-23 21:47:11,345 - annb.17c6b01171 - ERROR - Container.wait for container 17c6b01171 failed with exception
Traceback (most recent call last):
  File "/Users/dmitry/projects/github/vs/ann-benchmarks/venv/lib/python3.6/site-packages/urllib3/response.py", line 438, in _error_catcher
    yield
  File "/Users/dmitry/projects/github/vs/ann-benchmarks/venv/lib/python3.6/site-packages/urllib3/response.py", line 764, in read_chunked
    self._update_chunk_length()
  File "/Users/dmitry/projects/github/vs/ann-benchmarks/venv/lib/python3.6/site-packages/urllib3/response.py", line 694, in _update_chunk_length
    line = self._fp.fp.readline()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/socket.py", line 586, in readinto
    return self._sock.recv_into(b)
socket.timeout: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/dmitry/projects/github/vs/ann-benchmarks/venv/lib/python3.6/site-packages/requests/models.py", line 758, in generate
    for chunk in self.raw.stream(chunk_size, decode_content=True):
  File "/Users/dmitry/projects/github/vs/ann-benchmarks/venv/lib/python3.6/site-packages/urllib3/response.py", line 572, in stream
    for line in self.read_chunked(amt, decode_content=decode_content):
  File "/Users/dmitry/projects/github/vs/ann-benchmarks/venv/lib/python3.6/site-packages/urllib3/response.py", line 793, in read_chunked
    self._original_response.close()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "/Users/dmitry/projects/github/vs/ann-benchmarks/venv/lib/python3.6/site-packages/urllib3/response.py", line 443, in _error_catcher
    raise ReadTimeoutError(self._pool, None, "Read timed out.")
urllib3.exceptions.ReadTimeoutError: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out.

Is there anything to tweak on docker / os side? I will try to prevent my OS from going to sleep to see if this helps.

DmitryKey avatar Jul 24 '21 06:07 DmitryKey

I tweaked the sleeping schedule of my machine -- still same issue. Will upgrade docker next from 3.3.1 to more recent one.

Also noticed stable ram insufficiency for Puffin algorithm. Is this a known issue?

2021-07-27 19:25:31,506 - annb.408f02c65f - INFO - Created container 408f02c65f: CPU limit 1, mem limit 9475463936, timeout 7200, command ['--dataset', 'glove-100-angular', '--algorithm', 'puffinn', '--module', 'ann_benchmarks.algorithms.puffinn', '--constructor', 'Puffinn', '--runs', '5', '--count', '10', '["angular", 268435456, "fht_crosspolytope"]', '[0.1]', '[0.2]', '[0.5]', '[0.7]', '[0.9]', '[0.95]', '[0.99]']
2021-07-27 19:26:13,949 - annb.408f02c65f - INFO - ['angular', 268435456, 'fht_crosspolytope']
2021-07-27 19:26:13,949 - annb.408f02c65f - INFO - Trying to instantiate ann_benchmarks.algorithms.puffinn.Puffinn(['angular', 268435456, 'fht_crosspolytope'])
2021-07-27 19:26:13,950 - annb.408f02c65f - INFO - got a train set of size (1183514 * 100)
2021-07-27 19:26:13,951 - annb.408f02c65f - INFO - got 10000 queries
2021-07-27 19:26:14,120 - annb.408f02c65f - INFO - Traceback (most recent call last):
2021-07-27 19:26:14,121 - annb.408f02c65f - INFO -   File "run_algorithm.py", line 3, in <module>
2021-07-27 19:26:14,122 - annb.408f02c65f - INFO -     run_from_cmdline()
2021-07-27 19:26:14,122 - annb.408f02c65f - INFO -   File "/home/app/ann_benchmarks/runner.py", line 211, in run_from_cmdline
2021-07-27 19:26:14,123 - annb.408f02c65f - INFO -     run(definition, args.dataset, args.count, args.runs, args.batch)
2021-07-27 19:26:14,124 - annb.408f02c65f - INFO -   File "/home/app/ann_benchmarks/runner.py", line 122, in run
2021-07-27 19:26:14,124 - annb.408f02c65f - INFO -     algo.fit(X_train)
2021-07-27 19:26:14,125 - annb.408f02c65f - INFO -   File "/home/app/ann_benchmarks/algorithms/puffinn.py", line 35, in fit
2021-07-27 19:26:14,126 - annb.408f02c65f - INFO -     self.index.rebuild()
2021-07-27 19:26:14,126 - annb.408f02c65f - INFO - ValueError: insufficient memory
2021-07-27 19:26:18,382 - annb.408f02c65f - ERROR - ['angular', 268435456, 'fht_crosspolytope']
Trying to instantiate ann_benchmarks.algorithms.puffinn.Puffinn(['angular', 268435456, 'fht_crosspolytope'])
got a train set of size (1183514 * 100)
got 10000 queries
Traceback (most recent call last):
  File "run_algorithm.py", line 3, in <module>
    run_from_cmdline()
  File "/home/app/ann_benchmarks/runner.py", line 211, in run_from_cmdline
    run(definition, args.dataset, args.count, args.runs, args.batch)
  File "/home/app/ann_benchmarks/runner.py", line 122, in run
    algo.fit(X_train)
  File "/home/app/ann_benchmarks/algorithms/puffinn.py", line 35, in fit
    self.index.rebuild()
ValueError: insufficient memory


DmitryKey avatar Jul 27 '21 17:07 DmitryKey

Hi @DmitryKey. Sorry for not following up on this further!

I think both the timeouts and memory problems are known issues. Did you notice any additional problems updating to Python 3.7. It seems necessary to me that we try to bump everything up to 3.7 or (better) 3.8.

maumueller avatar Oct 25 '21 08:10 maumueller