spark-nlp
spark-nlp copied to clipboard
Import error in 3.3.4 (Python 3.9): ModuleNotFoundError: No module named 'com.johnsnowlabs'
Description
When trying to use the Python version of SparkNLP, I am suddenly getting this error when using import sparknlp:
ModuleNotFoundError: No module named 'com.johnsnowlabs'
Expected Behavior
There should be no error.
Current Behavior
Instead, I get this error: ModuleNotFoundError: No module named 'com.johnsnowlabs'
Steps to Reproduce
- Create a new venv for Python 3.9
- Install the following in this venv: numpy 1.21.4, py4j 0.10.9.2, pyspark 3.2.0, spark-nlp 3.3.4
- Go to the Python shell
- Type "import sparknlp".
- Result: The error above (originating from annotator.py, line 26:
import com.johnsnowlabs.nlp)
Your Environment
- Spark NLP version
sparknlp.version(): 3.3.4 - Apache NLP version
spark.version: pyspark 3.2.0 - Java version
java -version: openjdk version "1.8.0_282" - Setup and installation (Pypi, Conda, Maven, etc.):
- Operating System and version: Mac OS 11.6 (Big Sur) on M1 CPU
Unfortunately, you have everything that is not supported at this moment:
- Spark/PySpark 3.2.x is not supported: https://github.com/JohnSnowLabs/spark-nlp#apache-spark-support
- Python 3.9 is not supported: https://github.com/JohnSnowLabs/spark-nlp#scala-and-python-support
- M1 is not working with TF: https://github.com/JohnSnowLabs/spark-nlp/discussions/2282
Thanks a lot @maziyarpanahi for the quick response. Strangely enough, I haven't had any SparkNLP issues before with the same setup. It only popped up last week. However, I did run into issues before with TF on M1 and had to switch from Anaconda to Miniforge as a result. So, I will try to verify that I am still on Miniforge and try to downgrade PySpark and Python.
Thanks for your update, then I would say the first usual suspect is pyspark==3.2.x which you can downgrade to pyspark==3.1.2 to see what happens. Then if it's the same you can go with Python 3.9 to 3.8.x.
If you manage to successfully use spark-nlp and its TF annotators on M1, it would be great if you can share how in this discussion (unfortunately, I don't have a M1 around so I could never really try it myself): https://github.com/JohnSnowLabs/spark-nlp/discussions/2282
I have tried to downgrade pyspark to 3.1.2, but this didn't work. Then, I create a new conda environment with Python 3.8, sparknlp 3.3.4 and pyspark 3.1.2. But even in this brandnew conda environment I am getting the error ModuleNotFoundError: No module named 'com.johnsnowlabs'.
That's really strange, apart from what works or not, the simple import sparknlp is just whether or not you have pip install spark-nlp in your Python environment:

So this should really work at least for the import:
$ conda create -n sparknlp python=3.7 -y
$ conda activate sparknlp
# spark-nlp by default is based on pyspark 3.x
$ pip install spark-nlp==3.3.4 pyspark==3.1.2
Thanks a lot, @maziyarpanahi . Strange indeed. I'll keep digging around a bit more.
This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 5 days
This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 5 days