SynapseML
SynapseML copied to clipboard
[BUG] Installation Troubles with Spark Submit
SynapseML version
com.microsoft.azure:synapseml_2.12:0.10.2
System information
- Language version : python 2.7.18
- Spark Version : 3.3.1
- Spark Platform :Pyspark 3.8.10 -OS: Ubuntu
Describe the problem
I tried using synapse ML package through Pyspark code for using LightGBM module . The synapse ML package loaded and worked properly during December 2022. Now the Synapse ML package is not working and giving the error message "No module named '[synapse.ml]" as shown below in the Ubuntu system when run through spark submit.
But when I run the same code in windows environment using Python Idle shell it is working well and when I run the code in windows system by Spark submit it shows No Module found error.
why the same package is working in windows Python IDLE and not working in Spark submit and also in ubuntu OS?
Kindly help me to solve this issue to work with this package through spark submit in windows & Ubuntu OS.
ERROR MESSAGE:
12:03:52 :Package Loaded Finished
Traceback (most recent call last):
File "/home/mw-user/lgbmdemo.py", line 3, in
Code to reproduce issue
import pyspark spark = SparkSession.builder.appName("sample").config("spark.jars.packages", "com.microsoft.azure:synapseml_2.12:0.10.2") import synapse.ml from synapse.ml.lightgbm import * from synapse.ml.train import ComputeModelStatistics
Other info / logs
12:03:52 :Package Loaded Finished
Traceback (most recent call last):
File "/home/mw-user/lgbmdemo.py", line 3, in
What component(s) does this bug affect?
- [ ]
area/cognitive
: Cognitive project - [ ]
area/core
: Core project - [ ]
area/deep-learning
: DeepLearning project - [X]
area/lightgbm
: Lightgbm project - [ ]
area/opencv
: Opencv project - [ ]
area/vw
: VW project - [ ]
area/website
: Website - [ ]
area/build
: Project build system - [ ]
area/notebooks
: Samples under notebooks folder - [X]
area/docker
: Docker usage - [ ]
area/models
: models related issue
What language(s) does this bug affect?
- [ ]
language/scala
: Scala source code - [X]
language/python
: Pyspark APIs - [ ]
language/r
: R APIs - [ ]
language/csharp
: .NET APIs - [ ]
language/new
: Proposals for new client languages
What integration(s) does this bug affect?
- [ ]
integrations/synapse
: Azure Synapse integrations - [ ]
integrations/azureml
: Azure ML integrations - [ ]
integrations/databricks
: Databricks integrations
Hey @DrVajiha :wave:! Thank you so much for reporting the issue/feature request :rotating_light:. Someone from SynapseML Team will be looking to triage this issue soon. We appreciate your patience.
Hi @DrVajiha You need to run
spark = SparkSession.builder.appName("sample").config("spark.jars.packages", "com.microsoft.azure:synapseml_2.12:0.10.2").getOrCreate()
with the final action 'getOrCreate' in order to start the spark session. Otherwise it doesn't really download the package.
Hi @serena-ruan , I have given getOrCreate in my code while running. I have missed this when typing this error report. Kindly help me to use synapse ML package in spark submit.
My Actual code: import pyspark spark = SparkSession.builder.appName("sample").config("spark.jars.packages", "com.microsoft.azure:synapseml_2.12:0.10.2")..config("spark.executor.memory","4g").config("spark.executor.cores","2").config("num-executors","4").config("spark.driver.memory","4g").getOrCreate() import synapse.ml from synapse.ml.lightgbm import * from synapse.ml.train import ComputeModelStatistics
Could you paste your logs here (especially logs when starting spark session) so I could help debug on it? I could successfully install v0.10.2 on my ubuntu machine.
And also please make sure you use spark3.2 for installing v0.10.2
LOG:
23/02/06 12:03:47 WARN Utils: Your hostname, mwuser-Inspiron-15-3511 resolves to a loopback address: 127.0.1.1; using ...* instead (on interface wlp1s0)
23/02/06 12:03:47 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Loading
12:03:49 :Package Loaded
23/02/06 12:03:49 INFO SparkContext: Running Spark version 3.3.1
23/02/06 12:03:50 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
23/02/06 12:03:50 INFO ResourceUtils: ==============================================================
23/02/06 12:03:50 INFO ResourceUtils: No custom resources configured for spark.driver.
23/02/06 12:03:50 INFO ResourceUtils: ==============================================================
23/02/06 12:03:50 INFO SparkContext: Submitted application: MyApp
23/02/06 12:03:50 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 4, script: , vendor: , memory -> name: memory, amount: 3072, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
23/02/06 12:03:50 INFO ResourceProfile: Limiting resource is cpus at 4 tasks per executor
23/02/06 12:03:50 INFO ResourceProfileManager: Added ResourceProfile id: 0
23/02/06 12:03:50 INFO SecurityManager: Changing view acls to: mw-user
23/02/06 12:03:50 INFO SecurityManager: Changing modify acls to: mw-user
23/02/06 12:03:50 INFO SecurityManager: Changing view acls groups to:
23/02/06 12:03:50 INFO SecurityManager: Changing modify acls groups to:
23/02/06 12:03:50 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(mw-user); groups with view permissions: Set(); users with modify permissions: Set(mw-user); groups with modify permissions: Set()
23/02/06 12:03:50 INFO Utils: Successfully started service 'sparkDriver' on port 34941.
23/02/06 12:03:51 INFO SparkEnv: Registering MapOutputTracker
23/02/06 12:03:51 INFO SparkEnv: Registering BlockManagerMaster
23/02/06 12:03:51 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
23/02/06 12:03:51 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
23/02/06 12:03:51 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
23/02/06 12:03:51 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-e4667682-c82b-4b2a-937d-16c984cf2e7e
23/02/06 12:03:51 INFO MemoryStore: MemoryStore started with capacity 2004.6 MiB
23/02/06 12:03:51 INFO SparkEnv: Registering OutputCommitCoordinator
23/02/06 12:03:51 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
23/02/06 12:03:51 INFO Utils: Successfully started service 'SparkUI' on port 4041.
23/02/06 12:03:51 INFO Executor: Starting executor ID driver on host ...*
23/02/06 12:03:51 INFO Executor: Starting executor with user classpath (userClassPathFirst = false): ''
23/02/06 12:03:51 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 45567.
23/02/06 12:03:51 INFO NettyBlockTransferService: Server created on ...:45567
23/02/06 12:03:51 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
23/02/06 12:03:51 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.10.114, 45567, None)
23/02/06 12:03:51 INFO BlockManagerMasterEndpoint: Registering block manager ...:45567 with 2004.6 MiB RAM, BlockManagerId(driver, ...,45567, None)
23/02/06 12:03:51 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, ..., 45567, None)
23/02/06 12:03:51 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, ..., 45567, None)
12:03:52 :Package Loaded Finished
Traceback (most recent call last):
File "/home/mw-user/ldbmdemo.py", line 26, in
Hi @DrVajiha I see that you're using spark 3.3, but our version v0.10.2 is actually for spark3.2, could you try installing spark3.2 instead?
@DrVajiha i also dont see any installation logs in the logs you sent. If you are using Spark submit please consider using the instructions here: https://github.com/microsoft/SynapseML#spark-submit
@serena-ruan,@mhamilton723 I have installed spark version to 3.2.2 and tried installing lgbm synapse package using spark submit. Still I'm facing the same issue "ModuleNotFoundError". import synapse.ml ModuleNotFoundError: No module named 'synapse.ml'
@DrVajiha -- I do not see the jar download actually happening in the logs you shared. It seems that somehow the jars are not getting downloaded in your environment which is why you're seeing the module not found. The logs should spit out something like:
com.microsoft.azure#synapseml_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-8d72dfbf-69d4-4ca9-b101-d0f871f39a16;1.0
confs: [default]
found com.microsoft.azure#synapseml_2.12;0.10.2 in central <<<<----- PACKAGE FOUND
found com.microsoft.azure#synapseml-core_2.12;0.10.2 in central <<<<----- PACKAGE FOUND
found org.scalactic#scalactic_2.12;3.2.14 in central
found org.scala-lang#scala-reflect;2.12.15 in central
found io.spray#spray-json_2.12;1.3.5 in central
found com.jcraft#jsch;0.1.54 in central
found org.apache.httpcomponents.client5#httpclient5;5.1.3 in central
found org.apache.httpcomponents.core5#httpcore5;5.1.3 in central
found org.apache.httpcomponents.core5#httpcore5-h2;5.1.3 in central
found org.slf4j#slf4j-api;1.7.25 in central
found commons-codec#commons-codec;1.15 in central
found org.apache.httpcomponents#httpmime;4.5.13 in central
found org.apache.httpcomponents#httpclient;4.5.13 in central
found org.apache.httpcomponents#httpcore;4.4.13 in central
found commons-logging#commons-logging;1.2 in central
found com.linkedin.isolation-forest#isolation-forest_3.2.0_2.12;2.0.8 in central
found com.chuusai#shapeless_2.12;2.3.2 in central
found org.typelevel#macro-compat_2.12;1.1.1 in central
found org.apache.spark#spark-avro_2.12;3.2.0 in central
found org.tukaani#xz;1.8 in central
found org.spark-project.spark#unused;1.0.0 in central
found org.testng#testng;6.8.8 in central
found org.beanshell#bsh;2.0b4 in central
found com.beust#jcommander;1.27 in central
found com.microsoft.azure#synapseml-deep-learning_2.12;0.10.2 in central
found com.microsoft.azure#synapseml-opencv_2.12;0.10.2 in central
found org.openpnp#opencv;3.2.0-1 in central
found com.microsoft.azure#onnx-protobuf_2.12;0.9.1 in central
found com.microsoft.cntk#cntk;2.4 in central
found com.microsoft.onnxruntime#onnxruntime_gpu;1.8.1 in central
found com.microsoft.azure#synapseml-cognitive_2.12;0.10.2 in central
found com.microsoft.cognitiveservices.speech#client-jar-sdk;1.14.0 in central
found com.microsoft.azure#synapseml-vw_2.12;0.10.2 in central
found com.github.vowpalwabbit#vw-jni;8.9.1 in central
downloading https://repo1.maven.org/maven2/com/microsoft/azure/synapseml_2.12/0.10.2/synapseml_2.12-0.10.2.jar ...
[SUCCESSFUL ] com.microsoft.azure#synapseml_2.12;0.10.2!synapseml_2.12.jar (18ms) <<<<----- PACKAGE INSTALLED
downloading https://repo1.maven.org/maven2/com/microsoft/azure/synapseml-core_2.12/0.10.2/synapseml-core_2.12-0.10.2.jar ...
[SUCCESSFUL ] com.microsoft.azure#synapseml-core_2.12;0.10.2!synapseml-core_2.12.jar (490ms)
downloading https://repo1.maven.org/maven2/com/microsoft/azure/synapseml-deep-learning_2.12/0.10.2/synapseml-deep-learning_2.12-0.10.2.jar ... <<<<----- PACKAGE INSTALLED
[SUCCESSFUL ] com.microsoft.azure#synapseml-deep-learning_2.12;0.10.2!synapseml-deep-learning_2.12.jar (86ms)
downloading https://repo1.maven.org/maven2/com/microsoft/azure/synapseml-cognitive_2.12/0.10.2/synapseml-cognitive_2.12-0.10.2.jar ...
[SUCCESSFUL ] com.microsoft.azure#synapseml-cognitive_2.12;0.10.2!synapseml-cognitive_2.12.jar (487ms)
downloading https://repo1.maven.org/maven2/com/microsoft/azure/synapseml-vw_2.12/0.10.2/synapseml-vw_2.12-0.10.2.jar ...
[SUCCESSFUL ] com.microsoft.azure#synapseml-vw_2.12;0.10.2!synapseml-vw_2.12.jar (84ms)
downloading https://repo1.maven.org/maven2/com/microsoft/azure/synapseml-lightgbm_2.12/0.10.2/synapseml-lightgbm_2.12-0.10.2.jar ...
[SUCCESSFUL ] com.microsoft.azure#synapseml-lightgbm_2.12;0.10.2!synapseml-lightgbm_2.12.jar (111ms)
downloading https://repo1.maven.org/maven2/com/microsoft/azure/synapseml-opencv_2.12/0.10.2/synapseml-opencv_2.12-0.10.2.jar ...
[SUCCESSFUL ] com.microsoft.azure#synapseml-opencv_2.12;0.10.2!synapseml-opencv_2.12.jar (28ms)
downloading https://repo1.maven.org/maven2/com/microsoft/azure/onnx-protobuf_2.12/0.9.1/onnx-protobuf_2.12-0.9.1-assembly.jar ...
[SUCCESSFUL ] com.microsoft.azure#onnx-protobuf_2.12;0.9.1!onnx-protobuf_2.12.jar (392ms)
downloading https://repo1.maven.org/maven2/com/microsoft/cntk/cntk/2.4/cntk-2.4.jar ...
[SUCCESSFUL ] com.microsoft.cntk#cntk;2.4!cntk.jar (53122ms)
downloading https://repo1.maven.org/maven2/com/microsoft/onnxruntime/onnxruntime_gpu/1.8.1/onnxruntime_gpu-1.8.1.jar ...
[SUCCESSFUL ] com.microsoft.onnxruntime#onnxruntime_gpu;1.8.1!onnxruntime_gpu.jar (32420ms)
downloading https://repo1.maven.org/maven2/org/openpnp/opencv/3.2.0-1/opencv-3.2.0-1.jar ...
[SUCCESSFUL ] org.openpnp#opencv;3.2.0-1!opencv.jar(bundle) (17325ms)
downloading https://repo1.maven.org/maven2/com/microsoft/cognitiveservices/speech/client-jar-sdk/1.14.0/client-jar-sdk-1.14.0.jar ...
[SUCCESSFUL ] com.microsoft.cognitiveservices.speech#client-jar-sdk;1.14.0!client-jar-sdk.jar (2804ms)
downloading https://repo1.maven.org/maven2/com/github/vowpalwabbit/vw-jni/8.9.1/vw-jni-8.9.1.jar ...
[SUCCESSFUL ] com.github.vowpalwabbit#vw-jni;8.9.1!vw-jni.jar (990ms)
downloading https://repo1.maven.org/maven2/com/microsoft/ml/lightgbm/lightgbmlib/3.2.110/lightgbmlib-3.2.110.jar ...
[SUCCESSFUL ] com.microsoft.ml.lightgbm#lightgbmlib;3.2.110!lightgbmlib.jar (682ms)
:: resolution report :: resolve 2718ms :: artifacts dl 109050ms
:: modules in use:
com.beust#jcommander;1.27 from central in [default]
com.chuusai#shapeless_2.12;2.3.2 from central in [default]
com.github.vowpalwabbit#vw-jni;8.9.1 from central in [default]
com.jcraft#jsch;0.1.54 from central in [default]
com.linkedin.isolation-forest#isolation-forest_3.2.0_2.12;2.0.8 from central in [default]
com.microsoft.azure#onnx-protobuf_2.12;0.9.1 from central in [default]
com.microsoft.azure#synapseml-cognitive_2.12;0.10.2 from central in [default]
com.microsoft.azure#synapseml-core_2.12;0.10.2 from central in [default]
com.microsoft.azure#synapseml-deep-learning_2.12;0.10.2 from central in [default]
com.microsoft.azure#synapseml-lightgbm_2.12;0.10.2 from central in [default]
com.microsoft.azure#synapseml-opencv_2.12;0.10.2 from central in [default]
com.microsoft.azure#synapseml-vw_2.12;0.10.2 from central in [default]
com.microsoft.azure#synapseml_2.12;0.10.2 from central in [default]
com.microsoft.cntk#cntk;2.4 from central in [default]
com.microsoft.cognitiveservices.speech#client-jar-sdk;1.14.0 from central in [default]
com.microsoft.ml.lightgbm#lightgbmlib;3.2.110 from central in [default]
com.microsoft.onnxruntime#onnxruntime_gpu;1.8.1 from central in [default]
commons-codec#commons-codec;1.15 from central in [default]
commons-logging#commons-logging;1.2 from central in [default]
io.spray#spray-json_2.12;1.3.5 from central in [default]
org.apache.httpcomponents#httpclient;4.5.13 from central in [default]
org.apache.httpcomponents#httpcore;4.4.13 from central in [default]
org.apache.httpcomponents#httpmime;4.5.13 from central in [default]
org.apache.httpcomponents.client5#httpclient5;5.1.3 from central in [default]
org.apache.httpcomponents.core5#httpcore5;5.1.3 from central in [default]
org.apache.httpcomponents.core5#httpcore5-h2;5.1.3 from central in [default]
org.apache.spark#spark-avro_2.12;3.2.0 from central in [default]
org.beanshell#bsh;2.0b4 from central in [default]
org.openpnp#opencv;3.2.0-1 from central in [default]
org.scala-lang#scala-reflect;2.12.15 from central in [default]
org.scalactic#scalactic_2.12;3.2.14 from central in [default]
org.slf4j#slf4j-api;1.7.25 from central in [default]
org.spark-project.spark#unused;1.0.0 from central in [default]
org.testng#testng;6.8.8 from central in [default]
org.tukaani#xz;1.8 from central in [default]
org.typelevel#macro-compat_2.12;1.1.1 from central in [default]
:: evicted modules:
commons-codec#commons-codec;1.11 by [commons-codec#commons-codec;1.15] in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 37 | 14 | 14 | 1 || 36 | 14 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-8d72dfbf-69d4-4ca9-b101-d0f871f39a16
confs: [default]
14 artifacts copied, 22 already retrieved (404832kB/473ms)
Could you share/paste the entire logs when you tried the pyspark code @mhamilton723 mentioned to see if the jar package downloaded ?
I'm sorry to intrude, but I have somewhat the same problem. Some jars are downloaded, but never the ones I specify in the SparkSession config, no matter the jars I specify. Do you know what the problem might be?