soda-spark icon indicating copy to clipboard operation
soda-spark copied to clipboard

Fails to install on Azure Databricks Cluster

Open sachinwadhwa opened this issue 2 years ago • 4 comments

Library installation attempted on the driver node of cluster 0531-095737-pc8ifbl4 and failed. Please refer to the following error message to fix the library or contact Databricks support. Error Code: DRIVER_LIBRARY_INSTALLATION_FAILURE. Error Message: org.apache.spark.SparkException: Process List(/databricks/python/bin/pip, install, soda-spark, --disable-pip-version-check) exited with code 1. ERROR: Command errored out with exit status 1: command: /databricks/python3/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-hk_a28h0/sasl_22bdc11526b24a309f12b898eb2ce262/setup.py'"'"'; file='"'"'/tmp/pip-install-hk_a28h0/sasl_22bdc11526b24a309f12b898eb2ce262/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-0uq392_j cwd: /tmp/pip-install-hk_a28h0/sasl_22bdc11526b24a309f12b898eb2ce262/ Complete output (29 lines): running bdist_wheel running build running build_py creating build creating build/lib.linux-x86_64-3.8 creating build/lib.linux-x86_64-3.8/sasl copying sasl/init.py -> build/lib.linux-x86_64-3.8/sasl running egg_info writing sasl.egg-info/PKG-INFO writing dependency_links to sasl.egg-info/dependency_links.txt writing requirements to sasl.egg-info/requires.txt writing top-level names to sasl.egg-info/top_level.txt reading manifest file 'sasl.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' writing manifest file 'sasl.egg-info/SOURCES.txt' copying sasl/saslwrapper.cpp -> build/lib.linux-x86_64-3.8/sasl copying sasl/saslwrapper.h -> build/lib.linux-x86_64-3.8/sasl copying sasl/saslwrapper.pyx -> build/lib.linux-x86_64-3.8/sasl running build_ext building 'sasl.saslwrapper' extension creating build/temp.linux-x86_64-3.8 creating build/temp.linux-x86_64-3.8/sasl x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -Isasl -I/databricks/python3/include -I/usr/include/python3.8 -c sasl/saslwrapper.cpp -o build/temp.linux-x86_64-3.8/sasl/saslwrapper.o In file included from sasl/saslwrapper.cpp:629: sasl/saslwrapper.h:22:10: fatal error: sasl/sasl.h: No such file or directory 22 | #include <sasl/sasl.h> | ^~~~~~~~~~~~~ compilation terminated. error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

ERROR: Failed building wheel for sasl ERROR: Command errored out with exit status 1: command: /databricks/python3/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-hk_a28h0/sasl_22bdc11526b24a309f12b898eb2ce262/setup.py'"'"'; file='"'"'/tmp/pip-install-hk_a28h0/sasl_22bdc11526b24a309f12b898eb2ce262/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-_6sr1coa/install-record.txt --single-version-externally-managed --compile --install-headers /databricks/python3/include/site/python3.8/sasl cwd: /tmp/pip-install-hk_a28h0/sasl_22bdc11526b24a309f12b898eb2ce262/ Complete output (29 lines): running install running build running build_py creating build creating build/lib.linux-x86_64-3.8 creating build/lib.linux-x86_64-3.8/sasl copying sasl/init.py -> build/lib.linux-x86_64-3.8/sasl running egg_info writing sasl.egg-info/PKG-INFO writing dependency_links to sasl.egg-info/dependency_links.txt writing requirements to sasl.egg-info/requires.txt writing top-level names to sasl.egg-info/top_level.txt reading manifest file 'sasl.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' writing manifest file 'sasl.egg-info/SOURCES.txt' copying sasl/saslwrapper.cpp -> build/lib.linux-x86_64-3.8/sasl copying sasl/saslwrapper.h -> build/lib.linux-x86_64-3.8/sasl copying sasl/saslwrapper.pyx -> build/lib.linux-x86_64-3.8/sasl running build_ext building 'sasl.saslwrapper' extension creating build/temp.linux-x86_64-3.8 creating build/temp.linux-x86_64-3.8/sasl x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -Isasl -I/databricks/python3/include -I/usr/include/python3.8 -c sasl/saslwrapper.cpp -o build/temp.linux-x86_64-3.8/sasl/saslwrapper.o In file included from sasl/saslwrapper.cpp:629: sasl/saslwrapper.h:22:10: fatal error: sasl/sasl.h: No such file or directory 22 | #include <sasl/sasl.h> | ^~~~~~~~~~~~~ compilation terminated. error: command 'x86_64-linux-gnu-gcc' failed with exit status 1 ---------------------------------------- ERROR: Command errored out with exit status 1: /databricks/python3/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-hk_a28h0/sasl_22bdc11526b24a309f12b898eb2ce262/setup.py'"'"'; file='"'"'/tmp/pip-install-hk_a28h0/sasl_22bdc11526b24a309f12b898eb2ce262/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-_6sr1coa/install-record.txt --single-version-externally-managed --compile --install-headers /databricks/python3/include/site/python3.8/sasl Check the logs for full command output.

sachinwadhwa avatar Jun 10 '22 13:06 sachinwadhwa

Hi @sachinwadhwa, that is annoying, this should not happen.

It is expecting sasl to be there, but it is not. I think sasl is a dependency of soda-sql-spark (which is the dependency of soda-spark). A proper solution is to make that dependency optional in soda-sql-spark. Depending on the connection method that is used, it is or is not required. In soda-spark we do not require sasl and thus we can exclude that dependency.

However, it is a long route to a solution. @vijaykiran : I expect other users ran into the same problem, do you know if this happened before? A short-term solution is to install libsasl2-dev: sudo apt-get install libsasl2-dev

JCZuurmond avatar Jun 11 '22 07:06 JCZuurmond

@vijaykiran : How is this issue progressing? We are running into the same problem

JCZuurmond avatar Jul 25 '22 12:07 JCZuurmond

@vijaykiran : How is this issue progressing? We are running into the same problem

Anything new on this? Would like to use soda in databricks but this issue and the workaround makes it not really usable

bombercorny avatar Mar 27 '23 07:03 bombercorny

@bombercorny It seems that this package is, or soon wil be deprecated in favor of soda-core. I suggest to use soda-core-spark-df or soda-core-spark packages with Databricks, depending on your use-case.

Zieg avatar Aug 15 '23 11:08 Zieg