spark-nlp icon indicating copy to clipboard operation
spark-nlp copied to clipboard

Import error in 3.3.4 (Python 3.9): ModuleNotFoundError: No module named 'com.johnsnowlabs'

Open mwunderlich opened this issue 4 years ago • 7 comments

Description

When trying to use the Python version of SparkNLP, I am suddenly getting this error when using import sparknlp: ModuleNotFoundError: No module named 'com.johnsnowlabs'

Expected Behavior

There should be no error.

Current Behavior

Instead, I get this error: ModuleNotFoundError: No module named 'com.johnsnowlabs'

Steps to Reproduce

  1. Create a new venv for Python 3.9
  2. Install the following in this venv: numpy 1.21.4, py4j 0.10.9.2, pyspark 3.2.0, spark-nlp 3.3.4
  3. Go to the Python shell
  4. Type "import sparknlp".
  5. Result: The error above (originating from annotator.py, line 26: import com.johnsnowlabs.nlp)

Your Environment

  • Spark NLP version sparknlp.version(): 3.3.4
  • Apache NLP version spark.version: pyspark 3.2.0
  • Java version java -version: openjdk version "1.8.0_282"
  • Setup and installation (Pypi, Conda, Maven, etc.):
  • Operating System and version: Mac OS 11.6 (Big Sur) on M1 CPU

mwunderlich avatar Nov 29 '21 07:11 mwunderlich

Unfortunately, you have everything that is not supported at this moment:

  • Spark/PySpark 3.2.x is not supported: https://github.com/JohnSnowLabs/spark-nlp#apache-spark-support
  • Python 3.9 is not supported: https://github.com/JohnSnowLabs/spark-nlp#scala-and-python-support
  • M1 is not working with TF: https://github.com/JohnSnowLabs/spark-nlp/discussions/2282

maziyarpanahi avatar Nov 29 '21 09:11 maziyarpanahi

Thanks a lot @maziyarpanahi for the quick response. Strangely enough, I haven't had any SparkNLP issues before with the same setup. It only popped up last week. However, I did run into issues before with TF on M1 and had to switch from Anaconda to Miniforge as a result. So, I will try to verify that I am still on Miniforge and try to downgrade PySpark and Python.

mwunderlich avatar Nov 29 '21 09:11 mwunderlich

Thanks for your update, then I would say the first usual suspect is pyspark==3.2.x which you can downgrade to pyspark==3.1.2 to see what happens. Then if it's the same you can go with Python 3.9 to 3.8.x.

If you manage to successfully use spark-nlp and its TF annotators on M1, it would be great if you can share how in this discussion (unfortunately, I don't have a M1 around so I could never really try it myself): https://github.com/JohnSnowLabs/spark-nlp/discussions/2282

maziyarpanahi avatar Nov 29 '21 09:11 maziyarpanahi

I have tried to downgrade pyspark to 3.1.2, but this didn't work. Then, I create a new conda environment with Python 3.8, sparknlp 3.3.4 and pyspark 3.1.2. But even in this brandnew conda environment I am getting the error ModuleNotFoundError: No module named 'com.johnsnowlabs'.

mwunderlich avatar Nov 29 '21 11:11 mwunderlich

That's really strange, apart from what works or not, the simple import sparknlp is just whether or not you have pip install spark-nlp in your Python environment:

image

So this should really work at least for the import:

$ conda create -n sparknlp python=3.7 -y
$ conda activate sparknlp
# spark-nlp by default is based on pyspark 3.x
$ pip install spark-nlp==3.3.4 pyspark==3.1.2

maziyarpanahi avatar Nov 29 '21 11:11 maziyarpanahi

Thanks a lot, @maziyarpanahi . Strange indeed. I'll keep digging around a bit more.

mwunderlich avatar Nov 29 '21 13:11 mwunderlich

This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 5 days

github-actions[bot] avatar Jun 16 '22 00:06 github-actions[bot]

This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 5 days

github-actions[bot] avatar Oct 19 '22 00:10 github-actions[bot]