docker-spark icon indicating copy to clipboard operation
docker-spark copied to clipboard

Error while running ML algorthims: No module named numpy

Open shwetamittal019 opened this issue 3 years ago • 7 comments

File "/usr/bin/spark-3.0.1-bin-hadoop3.2/python/lib/pyspark.zip/pyspark/ml/param/init.py", line 26, in import numpy as np ModuleNotFoundError: No module named 'numpy'

Please help

shwetamittal019 avatar Dec 23 '20 14:12 shwetamittal019

Have you installed a Python dependency manager and installed nympy with it? Else, I see your missing step...

danielschulz avatar Jan 11 '21 22:01 danielschulz

Hi @shwetamittal019 ,

are you running the example within the python-template? or directly on spark-shell in an iterative way? If via python-template you can add numpy as one of the dependencies on your requirement.txt file and i will be installed on build: https://github.com/big-data-europe/docker-spark/blob/bc3f2127ca035fa06f77bc4f44a8b9e1478346a6/template/python/Dockerfile#L8-L10

Feel free to comment more so that we can help. Or better, feel free to share your use-case so that we can also reproduce.

Best regards,

GezimSejdiu avatar Mar 22 '21 22:03 GezimSejdiu

Hi @GezimSejdiu I am also having trouble with this. I did add numpy to requirements.txt yet upon starting the container while the numpy module is being installed I'm getting this error:

`Step 1/12 : FROM bde2020/spark-python-template:2.4.3-hadoop2.7

Executing 3 build triggers

---> Running in fc96aff6d8d3 Collecting Cython (from -r requirements.txt (line 1)) Downloading https://files.pythonhosted.org/packages/f6/e3/293d7d18a64dde5e60f809c5c3354ee812af713b1679c74708f88986a6b6/Cython-0.29.23-py2.py3-none-any.whl (978kB) Collecting numpy==1.18.1 (from -r requirements.txt (line 2)) Downloading https://files.pythonhosted.org/packages/40/de/0ea5092b8bfd2e3aa6fdbb2e499a9f9adf810992884d414defc1573dca3f/numpy-1.18.1.zip (5.4MB) Installing build dependencies: started Installing build dependencies: finished with status 'done' Getting requirements to build wheel: started Getting requirements to build wheel: finished with status 'done' Preparing wheel metadata: started Preparing wheel metadata: finished with status 'error' Complete output from command /usr/bin/python3.7 /usr/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py prepare_metadata_for_build_wheel /tmp/tmp2ikrbto3: Processing numpy/random/_bounded_integers.pxd.in Processing numpy/random/_bounded_integers.pyx.in Processing numpy/random/_common.pyx Processing numpy/random/_bit_generator.pyx Processing numpy/random/_generator.pyx Processing numpy/random/_philox.pyx Processing numpy/random/mtrand.pyx Processing numpy/random/_sfc64.pyx Processing numpy/random/_pcg64.pyx Processing numpy/random/_mt19937.pyx Cythonizing sources blas_opt_info: blas_mkl_info: customize UnixCCompiler libraries mkl_rt not found in ['/usr/local/lib', '/usr/lib'] NOT AVAILABLE

blis_info:
  libraries blis not found in ['/usr/local/lib', '/usr/lib']
  NOT AVAILABLE

openblas_info:
  libraries openblas not found in ['/usr/local/lib', '/usr/lib']
  NOT AVAILABLE

atlas_3_10_blas_threads_info:
Setting PTATLAS=ATLAS
  libraries tatlas not found in ['/usr/local/lib', '/usr/lib']
  NOT AVAILABLE

atlas_3_10_blas_info:
  libraries satlas not found in ['/usr/local/lib', '/usr/lib']
  NOT AVAILABLE

atlas_blas_threads_info:
Setting PTATLAS=ATLAS
  libraries ptf77blas,ptcblas,atlas not found in ['/usr/local/lib', '/usr/lib']
  NOT AVAILABLE

atlas_blas_info:
  libraries f77blas,cblas,atlas not found in ['/usr/local/lib', '/usr/lib']
  NOT AVAILABLE

accelerate_info:
  NOT AVAILABLE

blas_info:
  libraries blas not found in ['/usr/local/lib', '/usr/lib']
  NOT AVAILABLE

blas_src_info:
  NOT AVAILABLE

  NOT AVAILABLE

/bin/sh: svnversion: not found
non-existing path in 'numpy/distutils': 'site.cfg'
lapack_opt_info:
lapack_mkl_info:
  libraries mkl_rt not found in ['/usr/local/lib', '/usr/lib']
  NOT AVAILABLE

openblas_lapack_info:
  libraries openblas not found in ['/usr/local/lib', '/usr/lib']
  NOT AVAILABLE

openblas_clapack_info:
  libraries openblas,lapack not found in ['/usr/local/lib', '/usr/lib']
  NOT AVAILABLE

flame_info:
  libraries flame not found in ['/usr/local/lib', '/usr/lib']
  NOT AVAILABLE

atlas_3_10_threads_info:
Setting PTATLAS=ATLAS
  libraries lapack_atlas not found in /usr/local/lib
  libraries tatlas,tatlas not found in /usr/local/lib
  libraries lapack_atlas not found in /usr/lib
  libraries tatlas,tatlas not found in /usr/lib
<class 'numpy.distutils.system_info.atlas_3_10_threads_info'>
  NOT AVAILABLE

atlas_3_10_info:
  libraries lapack_atlas not found in /usr/local/lib
  libraries satlas,satlas not found in /usr/local/lib
  libraries lapack_atlas not found in /usr/lib
  libraries satlas,satlas not found in /usr/lib
<class 'numpy.distutils.system_info.atlas_3_10_info'>
  NOT AVAILABLE

atlas_threads_info:
Setting PTATLAS=ATLAS
  libraries lapack_atlas not found in /usr/local/lib
  libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib
  libraries lapack_atlas not found in /usr/lib
  libraries ptf77blas,ptcblas,atlas not found in /usr/lib
<class 'numpy.distutils.system_info.atlas_threads_info'>
  NOT AVAILABLE

atlas_info:
  libraries lapack_atlas not found in /usr/local/lib
  libraries f77blas,cblas,atlas not found in /usr/local/lib
  libraries lapack_atlas not found in /usr/lib
  libraries f77blas,cblas,atlas not found in /usr/lib
<class 'numpy.distutils.system_info.atlas_info'>
  NOT AVAILABLE

lapack_info:
  libraries lapack not found in ['/usr/local/lib', '/usr/lib']
  NOT AVAILABLE

lapack_src_info:
  NOT AVAILABLE

  NOT AVAILABLE

running dist_info
running build_src
build_src
building py_modules sources
creating build
creating build/src.linux-x86_64-3.7
creating build/src.linux-x86_64-3.7/numpy
creating build/src.linux-x86_64-3.7/numpy/distutils
building library "npymath" sources
Could not locate executable gfortran
Could not locate executable f95
Could not locate executable ifort
Could not locate executable ifc
Could not locate executable lf95
Could not locate executable pgfortran
Could not locate executable f90
Could not locate executable f77
Could not locate executable fort
Could not locate executable efort
Could not locate executable efc
Could not locate executable g77
Could not locate executable g95
Could not locate executable pathf95
Could not locate executable nagfor
don't know how to compile Fortran code on platform 'posix'
Running from numpy source directory.
setup.py:461: UserWarning: Unrecognized setuptools command, proceeding with generating Cython sources and expanding templates
  run_build = parse_setuppy_commands()
/tmp/pip-install-n0yoj555/numpy/numpy/distutils/system_info.py:1896: UserWarning:
    Optimized (vendor) Blas libraries are not found.
    Falls back to netlib Blas library which has worse performance.
    A better performance should be easily gained by switching
    Blas library.
  if self._calc_info(blas):
/tmp/pip-install-n0yoj555/numpy/numpy/distutils/system_info.py:1896: UserWarning:
    Blas (http://www.netlib.org/blas/) libraries not found.
    Directories to search for the libraries can be specified in the
    numpy/distutils/site.cfg file (section [blas]) or by setting
    the BLAS environment variable.
  if self._calc_info(blas):
/tmp/pip-install-n0yoj555/numpy/numpy/distutils/system_info.py:1896: UserWarning:
    Blas (http://www.netlib.org/blas/) sources not found.
    Directories to search for the sources can be specified in the
    numpy/distutils/site.cfg file (section [blas_src]) or by setting
    the BLAS_SRC environment variable.
  if self._calc_info(blas):
/tmp/pip-install-n0yoj555/numpy/numpy/distutils/system_info.py:1730: UserWarning:
    Lapack (http://www.netlib.org/lapack/) libraries not found.
    Directories to search for the libraries can be specified in the
    numpy/distutils/site.cfg file (section [lapack]) or by setting
    the LAPACK environment variable.
  return getattr(self, '_calc_info_{}'.format(name))()
/tmp/pip-install-n0yoj555/numpy/numpy/distutils/system_info.py:1730: UserWarning:
    Lapack (http://www.netlib.org/lapack/) sources not found.
    Directories to search for the sources can be specified in the
    numpy/distutils/site.cfg file (section [lapack_src]) or by setting
    the LAPACK_SRC environment variable.
  return getattr(self, '_calc_info_{}'.format(name))()
/usr/lib/python3.7/distutils/dist.py:274: UserWarning: Unknown distribution option: 'define_macros'
  warnings.warn(msg)
Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py", line 207, in <module>
    main()
  File "/usr/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py", line 197, in main
    json_out['return_val'] = hook(**hook_input['kwargs'])
  File "/usr/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py", line 69, in prepare_metadata_for_build_wheel
    return hook(metadata_directory, config_settings)
  File "/tmp/pip-build-env-obypaer2/overlay/lib/python3.7/site-packages/setuptools/build_meta.py", line 166, in prepare_metadata_for_build_wheel
    self.run_setup()
  File "/tmp/pip-build-env-obypaer2/overlay/lib/python3.7/site-packages/setuptools/build_meta.py", line 259, in run_setup
    self).run_setup(setup_script=setup_script)
  File "/tmp/pip-build-env-obypaer2/overlay/lib/python3.7/site-packages/setuptools/build_meta.py", line 150, in run_setup
    exec(compile(code, __file__, 'exec'), locals())
  File "setup.py", line 488, in <module>
    setup_package()
  File "setup.py", line 480, in setup_package
    setup(**metadata)
  File "/tmp/pip-install-n0yoj555/numpy/numpy/distutils/core.py", line 171, in setup
    return old_setup(**new_attr)
  File "/tmp/pip-build-env-obypaer2/overlay/lib/python3.7/site-packages/setuptools/__init__.py", line 153, in setup
    return distutils.core.setup(**attrs)
  File "/usr/lib/python3.7/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/usr/lib/python3.7/distutils/dist.py", line 966, in run_commands
    self.run_command(cmd)
  File "/usr/lib/python3.7/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/tmp/pip-build-env-obypaer2/overlay/lib/python3.7/site-packages/setuptools/command/dist_info.py", line 31, in run
    egg_info.run()
  File "/tmp/pip-install-n0yoj555/numpy/numpy/distutils/command/egg_info.py", line 26, in run
    self.run_command("build_src")
  File "/usr/lib/python3.7/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/usr/lib/python3.7/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/tmp/pip-install-n0yoj555/numpy/numpy/distutils/command/build_src.py", line 146, in run
    self.build_sources()
  File "/tmp/pip-install-n0yoj555/numpy/numpy/distutils/command/build_src.py", line 157, in build_sources
    self.build_library_sources(*libname_info)
  File "/tmp/pip-install-n0yoj555/numpy/numpy/distutils/command/build_src.py", line 290, in build_library_sources
    sources = self.generate_sources(sources, (lib_name, build_info))
  File "/tmp/pip-install-n0yoj555/numpy/numpy/distutils/command/build_src.py", line 380, in generate_sources
    source = func(extension, build_dir)
  File "numpy/core/setup.py", line 661, in get_mathlib_info
    raise RuntimeError("Broken toolchain: cannot link a simple C program")
RuntimeError: Broken toolchain: cannot link a simple C program

----------------------------------------

Command "/usr/bin/python3.7 /usr/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py prepare_metadata_for_build_wheel /tmp/tmp2ikrbto3" failed with error code 1 in /tmp/pip-install-n0yoj555/numpy You are using pip version 19.0.3, however version 21.0.1 is available. You should consider upgrading via the 'pip install --upgrade pip' command. ERROR: Service 'train-model' failed to build: The command '/bin/sh -c cd /app && pip3 install -r requirements.txt' returned a non-zero code: 1 `

I guess the container is missing gcc (from what I have been able to find on google) and thus it cannot install this module.

j-juric avatar Apr 23 '21 20:04 j-juric

I am having the same issue as well.

Philip-os avatar Apr 27 '21 08:04 Philip-os

I was able to install numpy by adding this line in my dockerfile.

RUN apk add --no-cache py3-numpy

j-juric avatar Apr 27 '21 11:04 j-juric

You could also extend the spark-submit image and install build dependencies before running pip install. You cannot do this with the Python template image though, that's why I decided to go with the submit image.

Something like this:

FROM bde2020/spark-submit:3.1.1-hadoop3.2

# Add build dependencies for c-libraries (important for building numpy and other sci-libs)
RUN apk --no-cache add --virtual build-deps musl-dev linux-headers g++ gcc python3-dev

# Copy the requirements.txt first, for separate dependency resolving and downloading
COPY app/requirements.txt /app/
RUN cd /app \ && pip3 install -r requirements.txt

dusandjovanovic avatar Jun 20 '21 11:06 dusandjovanovic

Run this on all the CLI of the containers

apk --no-cache --update-cache add gcc gfortran python python-dev py-pip build-base wget freetype-dev libpng-dev openblas-dev

devAmoghS avatar Aug 26 '22 08:08 devAmoghS