docker-spark
docker-spark copied to clipboard
Error while running ML algorthims: No module named numpy
File "/usr/bin/spark-3.0.1-bin-hadoop3.2/python/lib/pyspark.zip/pyspark/ml/param/init.py", line 26, in
Please help
Have you installed a Python dependency manager and installed nympy with it? Else, I see your missing step...
Hi @shwetamittal019 ,
are you running the example within the python-template? or directly on spark-shell in an iterative way? If via python-template you can add numpy as one of the dependencies on your requirement.txt
file and i will be installed on build:
https://github.com/big-data-europe/docker-spark/blob/bc3f2127ca035fa06f77bc4f44a8b9e1478346a6/template/python/Dockerfile#L8-L10
Feel free to comment more so that we can help. Or better, feel free to share your use-case so that we can also reproduce.
Best regards,
Hi @GezimSejdiu I am also having trouble with this. I did add numpy to requirements.txt yet upon starting the container while the numpy module is being installed I'm getting this error:
`Step 1/12 : FROM bde2020/spark-python-template:2.4.3-hadoop2.7
Executing 3 build triggers
---> Running in fc96aff6d8d3 Collecting Cython (from -r requirements.txt (line 1)) Downloading https://files.pythonhosted.org/packages/f6/e3/293d7d18a64dde5e60f809c5c3354ee812af713b1679c74708f88986a6b6/Cython-0.29.23-py2.py3-none-any.whl (978kB) Collecting numpy==1.18.1 (from -r requirements.txt (line 2)) Downloading https://files.pythonhosted.org/packages/40/de/0ea5092b8bfd2e3aa6fdbb2e499a9f9adf810992884d414defc1573dca3f/numpy-1.18.1.zip (5.4MB) Installing build dependencies: started Installing build dependencies: finished with status 'done' Getting requirements to build wheel: started Getting requirements to build wheel: finished with status 'done' Preparing wheel metadata: started Preparing wheel metadata: finished with status 'error' Complete output from command /usr/bin/python3.7 /usr/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py prepare_metadata_for_build_wheel /tmp/tmp2ikrbto3: Processing numpy/random/_bounded_integers.pxd.in Processing numpy/random/_bounded_integers.pyx.in Processing numpy/random/_common.pyx Processing numpy/random/_bit_generator.pyx Processing numpy/random/_generator.pyx Processing numpy/random/_philox.pyx Processing numpy/random/mtrand.pyx Processing numpy/random/_sfc64.pyx Processing numpy/random/_pcg64.pyx Processing numpy/random/_mt19937.pyx Cythonizing sources blas_opt_info: blas_mkl_info: customize UnixCCompiler libraries mkl_rt not found in ['/usr/local/lib', '/usr/lib'] NOT AVAILABLE
blis_info:
libraries blis not found in ['/usr/local/lib', '/usr/lib']
NOT AVAILABLE
openblas_info:
libraries openblas not found in ['/usr/local/lib', '/usr/lib']
NOT AVAILABLE
atlas_3_10_blas_threads_info:
Setting PTATLAS=ATLAS
libraries tatlas not found in ['/usr/local/lib', '/usr/lib']
NOT AVAILABLE
atlas_3_10_blas_info:
libraries satlas not found in ['/usr/local/lib', '/usr/lib']
NOT AVAILABLE
atlas_blas_threads_info:
Setting PTATLAS=ATLAS
libraries ptf77blas,ptcblas,atlas not found in ['/usr/local/lib', '/usr/lib']
NOT AVAILABLE
atlas_blas_info:
libraries f77blas,cblas,atlas not found in ['/usr/local/lib', '/usr/lib']
NOT AVAILABLE
accelerate_info:
NOT AVAILABLE
blas_info:
libraries blas not found in ['/usr/local/lib', '/usr/lib']
NOT AVAILABLE
blas_src_info:
NOT AVAILABLE
NOT AVAILABLE
/bin/sh: svnversion: not found
non-existing path in 'numpy/distutils': 'site.cfg'
lapack_opt_info:
lapack_mkl_info:
libraries mkl_rt not found in ['/usr/local/lib', '/usr/lib']
NOT AVAILABLE
openblas_lapack_info:
libraries openblas not found in ['/usr/local/lib', '/usr/lib']
NOT AVAILABLE
openblas_clapack_info:
libraries openblas,lapack not found in ['/usr/local/lib', '/usr/lib']
NOT AVAILABLE
flame_info:
libraries flame not found in ['/usr/local/lib', '/usr/lib']
NOT AVAILABLE
atlas_3_10_threads_info:
Setting PTATLAS=ATLAS
libraries lapack_atlas not found in /usr/local/lib
libraries tatlas,tatlas not found in /usr/local/lib
libraries lapack_atlas not found in /usr/lib
libraries tatlas,tatlas not found in /usr/lib
<class 'numpy.distutils.system_info.atlas_3_10_threads_info'>
NOT AVAILABLE
atlas_3_10_info:
libraries lapack_atlas not found in /usr/local/lib
libraries satlas,satlas not found in /usr/local/lib
libraries lapack_atlas not found in /usr/lib
libraries satlas,satlas not found in /usr/lib
<class 'numpy.distutils.system_info.atlas_3_10_info'>
NOT AVAILABLE
atlas_threads_info:
Setting PTATLAS=ATLAS
libraries lapack_atlas not found in /usr/local/lib
libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib
libraries lapack_atlas not found in /usr/lib
libraries ptf77blas,ptcblas,atlas not found in /usr/lib
<class 'numpy.distutils.system_info.atlas_threads_info'>
NOT AVAILABLE
atlas_info:
libraries lapack_atlas not found in /usr/local/lib
libraries f77blas,cblas,atlas not found in /usr/local/lib
libraries lapack_atlas not found in /usr/lib
libraries f77blas,cblas,atlas not found in /usr/lib
<class 'numpy.distutils.system_info.atlas_info'>
NOT AVAILABLE
lapack_info:
libraries lapack not found in ['/usr/local/lib', '/usr/lib']
NOT AVAILABLE
lapack_src_info:
NOT AVAILABLE
NOT AVAILABLE
running dist_info
running build_src
build_src
building py_modules sources
creating build
creating build/src.linux-x86_64-3.7
creating build/src.linux-x86_64-3.7/numpy
creating build/src.linux-x86_64-3.7/numpy/distutils
building library "npymath" sources
Could not locate executable gfortran
Could not locate executable f95
Could not locate executable ifort
Could not locate executable ifc
Could not locate executable lf95
Could not locate executable pgfortran
Could not locate executable f90
Could not locate executable f77
Could not locate executable fort
Could not locate executable efort
Could not locate executable efc
Could not locate executable g77
Could not locate executable g95
Could not locate executable pathf95
Could not locate executable nagfor
don't know how to compile Fortran code on platform 'posix'
Running from numpy source directory.
setup.py:461: UserWarning: Unrecognized setuptools command, proceeding with generating Cython sources and expanding templates
run_build = parse_setuppy_commands()
/tmp/pip-install-n0yoj555/numpy/numpy/distutils/system_info.py:1896: UserWarning:
Optimized (vendor) Blas libraries are not found.
Falls back to netlib Blas library which has worse performance.
A better performance should be easily gained by switching
Blas library.
if self._calc_info(blas):
/tmp/pip-install-n0yoj555/numpy/numpy/distutils/system_info.py:1896: UserWarning:
Blas (http://www.netlib.org/blas/) libraries not found.
Directories to search for the libraries can be specified in the
numpy/distutils/site.cfg file (section [blas]) or by setting
the BLAS environment variable.
if self._calc_info(blas):
/tmp/pip-install-n0yoj555/numpy/numpy/distutils/system_info.py:1896: UserWarning:
Blas (http://www.netlib.org/blas/) sources not found.
Directories to search for the sources can be specified in the
numpy/distutils/site.cfg file (section [blas_src]) or by setting
the BLAS_SRC environment variable.
if self._calc_info(blas):
/tmp/pip-install-n0yoj555/numpy/numpy/distutils/system_info.py:1730: UserWarning:
Lapack (http://www.netlib.org/lapack/) libraries not found.
Directories to search for the libraries can be specified in the
numpy/distutils/site.cfg file (section [lapack]) or by setting
the LAPACK environment variable.
return getattr(self, '_calc_info_{}'.format(name))()
/tmp/pip-install-n0yoj555/numpy/numpy/distutils/system_info.py:1730: UserWarning:
Lapack (http://www.netlib.org/lapack/) sources not found.
Directories to search for the sources can be specified in the
numpy/distutils/site.cfg file (section [lapack_src]) or by setting
the LAPACK_SRC environment variable.
return getattr(self, '_calc_info_{}'.format(name))()
/usr/lib/python3.7/distutils/dist.py:274: UserWarning: Unknown distribution option: 'define_macros'
warnings.warn(msg)
Traceback (most recent call last):
File "/usr/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py", line 207, in <module>
main()
File "/usr/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py", line 197, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/usr/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py", line 69, in prepare_metadata_for_build_wheel
return hook(metadata_directory, config_settings)
File "/tmp/pip-build-env-obypaer2/overlay/lib/python3.7/site-packages/setuptools/build_meta.py", line 166, in prepare_metadata_for_build_wheel
self.run_setup()
File "/tmp/pip-build-env-obypaer2/overlay/lib/python3.7/site-packages/setuptools/build_meta.py", line 259, in run_setup
self).run_setup(setup_script=setup_script)
File "/tmp/pip-build-env-obypaer2/overlay/lib/python3.7/site-packages/setuptools/build_meta.py", line 150, in run_setup
exec(compile(code, __file__, 'exec'), locals())
File "setup.py", line 488, in <module>
setup_package()
File "setup.py", line 480, in setup_package
setup(**metadata)
File "/tmp/pip-install-n0yoj555/numpy/numpy/distutils/core.py", line 171, in setup
return old_setup(**new_attr)
File "/tmp/pip-build-env-obypaer2/overlay/lib/python3.7/site-packages/setuptools/__init__.py", line 153, in setup
return distutils.core.setup(**attrs)
File "/usr/lib/python3.7/distutils/core.py", line 148, in setup
dist.run_commands()
File "/usr/lib/python3.7/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
File "/usr/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/tmp/pip-build-env-obypaer2/overlay/lib/python3.7/site-packages/setuptools/command/dist_info.py", line 31, in run
egg_info.run()
File "/tmp/pip-install-n0yoj555/numpy/numpy/distutils/command/egg_info.py", line 26, in run
self.run_command("build_src")
File "/usr/lib/python3.7/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/usr/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/tmp/pip-install-n0yoj555/numpy/numpy/distutils/command/build_src.py", line 146, in run
self.build_sources()
File "/tmp/pip-install-n0yoj555/numpy/numpy/distutils/command/build_src.py", line 157, in build_sources
self.build_library_sources(*libname_info)
File "/tmp/pip-install-n0yoj555/numpy/numpy/distutils/command/build_src.py", line 290, in build_library_sources
sources = self.generate_sources(sources, (lib_name, build_info))
File "/tmp/pip-install-n0yoj555/numpy/numpy/distutils/command/build_src.py", line 380, in generate_sources
source = func(extension, build_dir)
File "numpy/core/setup.py", line 661, in get_mathlib_info
raise RuntimeError("Broken toolchain: cannot link a simple C program")
RuntimeError: Broken toolchain: cannot link a simple C program
----------------------------------------
Command "/usr/bin/python3.7 /usr/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py prepare_metadata_for_build_wheel /tmp/tmp2ikrbto3" failed with error code 1 in /tmp/pip-install-n0yoj555/numpy You are using pip version 19.0.3, however version 21.0.1 is available. You should consider upgrading via the 'pip install --upgrade pip' command. ERROR: Service 'train-model' failed to build: The command '/bin/sh -c cd /app && pip3 install -r requirements.txt' returned a non-zero code: 1 `
I guess the container is missing gcc (from what I have been able to find on google) and thus it cannot install this module.
I am having the same issue as well.
I was able to install numpy by adding this line in my dockerfile.
RUN apk add --no-cache py3-numpy
You could also extend the spark-submit
image and install build dependencies before running pip install
. You cannot do this with the Python template image though, that's why I decided to go with the submit image.
Something like this:
FROM bde2020/spark-submit:3.1.1-hadoop3.2
# Add build dependencies for c-libraries (important for building numpy and other sci-libs)
RUN apk --no-cache add --virtual build-deps musl-dev linux-headers g++ gcc python3-dev
# Copy the requirements.txt first, for separate dependency resolving and downloading
COPY app/requirements.txt /app/
RUN cd /app \ && pip3 install -r requirements.txt
Run this on all the CLI of the containers
apk --no-cache --update-cache add gcc gfortran python python-dev py-pip build-base wget freetype-dev libpng-dev openblas-dev