SynapseML
SynapseML copied to clipboard
LightGBMClassificationModel.fit gives raise Py4JNetworkError("Answer from Java side is empty") py4j.protocol.Py4JNetworkError: Answer from Java side is empty
Describe the bug
LightGBMClassificationModel.fit Cannot handle too much data. Fails without even having to collect anything at the driver.
I LightGBMClassificationModel.fit on data(10000,241) - It executes perfectly
I LightGBMClassificationModel.fit on data(100000,241) - It executes perfectly
I LightGBMClassificationModel.fit on data(1000000,241) - The error shows up
I do not ever get WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation..
To Reproduce I can upload an entire example if necessary but since this error shows up only when I read all the rows in my data (I keep all the other configuration same) I hope it would not be necessary. Please do let me know in case it is needed
Expected behavior The execution can be slow by switching to disk caching but it should not fail IMO.
Info (please complete the following information): Python 3.9.7 pyspark 3.2.0 (pip install pyspark) sc session:
pyspark.sql.SparkSession.builder.appName("MyApp") \
.config("spark.jars.packages", "com.microsoft.azure:synapseml_2.12:0.9.4") \
.config("spark.jars.repositories", "https://mmlspark.azureedge.net/maven") \
.config("spark.driver.memory", "8g")\
.getOrCreate()
** Stacktrace**
----------------------------------------=====> (8 + 5) / 13]
Exception occurred during processing of request from ('127.0.0.1', 35340)
Traceback (most recent call last):
File "/home/nitin/miniconda3/envs/pyspark/lib/python3.9/socketserver.py", line 316, in _handle_request_noblock
self.process_request(request, client_address)
File "/home/nitin/miniconda3/envs/pyspark/lib/python3.9/socketserver.py", line 347, in process_request
self.finish_request(request, client_address)
File "/home/nitin/miniconda3/envs/pyspark/lib/python3.9/socketserver.py", line 360, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "/home/nitin/miniconda3/envs/pyspark/lib/python3.9/socketserver.py", line 747, in __init__
self.handle()
File "/home/nitin/miniconda3/envs/pyspark/lib/python3.9/site-packages/pyspark/accumulators.py", line 262, in handle
poll(accum_updates)
File "/home/nitin/miniconda3/envs/pyspark/lib/python3.9/site-packages/pyspark/accumulators.py", line 235, in poll
if func():
File "/home/nitin/miniconda3/envs/pyspark/lib/python3.9/site-packages/pyspark/accumulators.py", line 239, in accum_updates
num_updates = read_int(self.rfile)
File "/home/nitin/miniconda3/envs/pyspark/lib/python3.9/site-packages/pyspark/serializers.py", line 564, in read_int
raise EOFError
EOFError
----------------------------------------
ERROR:root:Exception while sending command.
Traceback (most recent call last):
File "/home/nitin/miniconda3/envs/pyspark/lib/python3.9/site-packages/py4j/clientserver.py", line 480, in send_command
raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/nitin/miniconda3/envs/pyspark/lib/python3.9/site-packages/py4j/java_gateway.py", line 1038, in send_command
response = connection.send_command(command)
File "/home/nitin/miniconda3/envs/pyspark/lib/python3.9/site-packages/py4j/clientserver.py", line 503, in send_command
raise Py4JNetworkError(
py4j.protocol.Py4JNetworkError: Error while sending or receiving
---------------------------------------------------------------------------
Py4JError Traceback (most recent call last)
/tmp/ipykernel_1227645/3903106392.py in <module>
1 # lp = LineProfiler()
----> 2 scores = est.train_evaluate_cv('./mar19/training_data.csv', 'hash_CR_ACCOUNT_NBR', \
3 'flag__6_months', None, save_results=True,\
4 evaluate = True)
~/pymonsoon/./ml_auto_spark/cv_estimator.py in train_evaluate_cv(self, data_path, index, label, nrows, save_results, evaluate)
308 df_modeling = self.prepare_modeling_data(df, self.n_splits, train=True).persist()
309 print("Training Models")
--> 310 self.train_cv(df_modeling)
311 print("Computing CV save artefacts")
312 df_post_prediction = self.predict_oof_cv(df_modeling, evaluate=evaluate)
~/pymonsoon/./ml_auto_spark/cv_estimator.py in train_cv(self, df)
131 self.run_params.update(update_params)
132 model = self.model.setParams(**self.run_params)
--> 133 model = model.fit(df)
134 self.trained_models.append(model)
135
~/miniconda3/envs/pyspark/lib/python3.9/site-packages/pyspark/ml/base.py in fit(self, dataset, params)
159 return self.copy(params)._fit(dataset)
160 else:
--> 161 return self._fit(dataset)
162 else:
163 raise TypeError("Params must be either a param map or a list/tuple of param maps, "
/tmp/spark-6102c3e3-6d12-4007-8b72-8a4d20f8e325/userFiles-a7fdb181-3cd5-4dd2-b745-f59b96544e25/com.microsoft.azure_synapseml-lightgbm_2.12-0.9.4.jar/synapse/ml/lightgbm/LightGBMClassifier.py in _fit(self, dataset)
1445
1446 def _fit(self, dataset):
-> 1447 java_model = self._fit_java(dataset)
1448 return self._create_model(java_model)
1449
~/miniconda3/envs/pyspark/lib/python3.9/site-packages/pyspark/ml/wrapper.py in _fit_java(self, dataset)
330 """
331 self._transfer_params_to_java()
--> 332 return self._java_obj.fit(dataset._jdf)
333
334 def _fit(self, dataset):
~/miniconda3/envs/pyspark/lib/python3.9/site-packages/py4j/java_gateway.py in __call__(self, *args)
1307
1308 answer = self.gateway_client.send_command(command)
-> 1309 return_value = get_return_value(
1310 answer, self.gateway_client, self.target_id, self.name)
1311
~/miniconda3/envs/pyspark/lib/python3.9/site-packages/pyspark/sql/utils.py in deco(*a, **kw)
109 def deco(*a, **kw):
110 try:
--> 111 return f(*a, **kw)
112 except py4j.protocol.Py4JJavaError as e:
113 converted = convert_exception(e.java_exception)
~/miniconda3/envs/pyspark/lib/python3.9/site-packages/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
332 format(target_id, ".", name, value))
333 else:
--> 334 raise Py4JError(
335 "An error occurred while calling {0}{1}{2}".
336 format(target_id, ".", name))
Py4JError: An error occurred while calling o52.fit
Additional context
AB#1984487
@Nitinsiwach really sorry, from the given information I really don't know what the issue could be. The "root" error message seems to be "py4j.protocol.Py4JNetworkError: Answer from Java side is empty". I'm really not sure how it's related to lightgbm classification model. I do see that you said it started failing on more data. Perhaps it ran into OOM, and only increasing number of nodes or machine RAM might help.