ipex-llm
ipex-llm copied to clipboard
'ModuleNotFoundError'>: No module named 'dataset'
My code contain several .py
files:
- brainMRI.py
- dataset.py
- Unet.py
And I want to use the
bigdl
backend to train the model.
if args.cluster_mode == "local":
init_orca_context(memory=args.memory)
if args.backend == "bigdl":
net = model_creator(config={})
optimizer = optim_creator(model=net, config={"lr": 0.001})
orca_estimator = Estimator.from_torch(model=net,
optimizer=optimizer,
loss=bce_dice_loss,
metrics=[],
backend=args.backend,
)
orca_estimator.fit(data=train_loader, epochs=args.epochs)
The cluster_mode
is local
. But I got the problem:
2022-06-27 11:07:40 ERROR TaskSetManager:70 - Task 0 in stage 1.0 failed 1 times; aborting job
Traceback (most recent call last):
File "brainMRI.py", line 192, in <module>
orca_estimator.fit(data=train_loader, epochs=args.epochs)
File "/home/arda/anaconda3/envs/mainly/lib/python3.7/site-packages/bigdl/orca/learn/pytorch/pytorch_spark_estimator.py", line 168, in fit
train_fset, val_fset = self._handle_data_loader(data, validation_data)
File "/home/arda/anaconda3/envs/mainly/lib/python3.7/site-packages/bigdl/orca/learn/pytorch/pytorch_spark_estimator.py", line 94, in _handle_data_loader
train_feature_set = FeatureSet.pytorch_dataloader(data, "", "")
File "/home/arda/anaconda3/envs/mainly/lib/python3.7/site-packages/bigdl/dllib/feature/common.py", line 389, in pytorch_dataloader
False, features, labels)
File "/home/arda/anaconda3/envs/mainly/lib/python3.7/site-packages/bigdl/dllib/utils/file_utils.py", line 227, in callZooFunc
raise e
File "/home/arda/anaconda3/envs/mainly/lib/python3.7/site-packages/bigdl/dllib/utils/file_utils.py", line 221, in callZooFunc
java_result = api(*args)
File "/home/arda/anaconda3/envs/mainly/lib/python3.7/site-packages/py4j/java_gateway.py", line 1257, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/home/arda/anaconda3/envs/mainly/lib/python3.7/site-packages/py4j/protocol.py", line 328, in get_return_value
format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o42.createFeatureSetFromPyTorch.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1, localhost, executor driver): jep.JepException: jep.JepException: <class 'ModuleNotFoundError'>: No module named 'dataset'
at com.intel.analytics.bigdl.orca.utils.PythonInterpreter$.threadExecute(PythonInterpreter.scala:98)
at com.intel.analytics.bigdl.orca.utils.PythonInterpreter$.exec(PythonInterpreter.scala:108)
at com.intel.analytics.bigdl.orca.net.PythonFeatureSet$$anonfun$loadPythonSet$1.apply(PythonFeatureSet.scala:96)
at com.intel.analytics.bigdl.orca.net.PythonFeatureSet$$anonfun$loadPythonSet$1.apply(PythonFeatureSet.scala:86)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:823)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:823)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: jep.JepException: <class 'ModuleNotFoundError'>: No module named 'dataset'
at /home/arda/anaconda3/envs/mainly/lib/python3.7/site-packages/pyspark/serializers.loads(serializers.py:587)
at <string>.<module>(<string>:4)
at jep.Jep.exec(Native Method)
at jep.Jep.exec(Jep.java:478)
at com.intel.analytics.bigdl.orca.utils.PythonInterpreter$$anonfun$1.apply$mcV$sp(PythonInterpreter.scala:106)
at com.intel.analytics.bigdl.orca.utils.PythonInterpreter$$anonfun$1.apply(PythonInterpreter.scala:105)
at com.intel.analytics.bigdl.orca.utils.PythonInterpreter$$anonfun$1.apply(PythonInterpreter.scala:105)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 more
I try to use export PYTHONPATH=/the/path/to/brainMRI:$PYTHONPATH
and then it can run successfully.
I want to know how can I solve the problem in the python code?
@qiuxin2012 Could you take a look at this issue? If there are extra files, do we need to manually add to PYTHONPATH? As in the Python side, the currently working directly will be automatically be within PYTHONPATH, but Java is not?
Yes , jep is using different implement rather than Python.
Then we will add export PYTHONPATH in our README for this example. Do you think you need to add this in the document somewhere? @qiuxin2012
Add export PYTHONPATH=/the/path/to/brainMRI:$PYTHONPATH
in README. Fixed.