nlu GPU support

Hi, I am using the Marian Models for translation. It works fine, but I am assuming it works only on CPU (I am using the following code: pipe_translate = nlu.load('hu.translate_to.en') translate = pipe_translate.predict("Sziasztok, mi a helyzet?") and the predict part takes about 5 second, and I have an A100 GPU, I dont think this should take so long...) I can't figure it out, how to use the GPU, or how to check, if it uses the GPU... (print (tf.test.gpu_device_name()) show the the GPU is there...) Where can I find some documentation/info about this issue? I had some issues with CUDA and java installation, but right now these look fine...

Thanks

May 31 '21 15:05 kormoczi

HI @kormoczi

you can call nlu.load('any model' , gpu=True) , which will enable GPU mode for NLU.

Make sure you enable the gpu mode in the very first call to NLU, or otherwise it will not enable. Keep in mind, the translator models are quite big and thus slow, even in GPU mode, but upgrades are planned.

May 31 '21 20:05 C-K-Loan

After some struggles with CUDA/Python/Ubuntu versions, finally I think the basic system is fine, I could run some basic tests on GPU. But with NLU, I still have problems. The loading of the CUDA libraries looks fine, but then I receive multiple error, the first one is this:

2021-06-07 12:46:56.905136: E external/org_tensorflow/tensorflow/core/common_runtime/session.cc:91] Failed to create session: Internal: CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid 2021-06-07 12:46:56.905182: E external/org_tensorflow/tensorflow/c/c_api.cc:2184] Internal: CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid [ | ]21/06/07 12:46:56 ERROR Instrumentation: org.tensorflow.exceptions.TensorFlowException: CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid at org.tensorflow.internal.c_api.AbstractTF_Status.throwExceptionIfNotOK(AbstractTF_Status.java:101) at org.tensorflow.Session.allocate(Session.java:576) at org.tensorflow.Session.(Session.java:97) at com.johnsnowlabs.ml.tensorflow.TensorflowWrapper$.read(TensorflowWrapper.scala:317) at com.johnsnowlabs.ml.tensorflow.ReadTensorflowModel.readTensorflowModel(TensorflowSerializeModel.scala:127) at com.johnsnowlabs.ml.tensorflow.ReadTensorflowModel.readTensorflowModel$(TensorflowSerializeModel.scala:103) at com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLModel$.readTensorflowModel(SentenceDetectorDLModel.scala:338) at com.johnsnowlabs.nlp.annotators.sentence_detector_dl.ReadsSentenceDetectorDLGraph.readSentenceDetectorDLGraph(SentenceDetectorDLModel.scala:320) at com.johnsnowlabs.nlp.annotators.sentence_detector_dl.ReadsSentenceDetectorDLGraph.readSentenceDetectorDLGraph$(SentenceDetectorDLModel.scala:318) at com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLModel$.readSentenceDetectorDLGraph(SentenceDetectorDLModel.scala:338) at com.johnsnowlabs.nlp.annotators.sentence_detector_dl.ReadsSentenceDetectorDLGraph.$anonfun$$init$$1(SentenceDetectorDLModel.scala:324) at com.johnsnowlabs.nlp.annotators.sentence_detector_dl.ReadsSentenceDetectorDLGraph.$anonfun$$init$$1$adapted(SentenceDetectorDLModel.scala:324) at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$onRead$1(ParamsAndFeaturesReadable.scala:31) at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$onRead$1$adapted(ParamsAndFeaturesReadable.scala:30) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.onRead(ParamsAndFeaturesReadable.scala:30) at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$read$1(ParamsAndFeaturesReadable.scala:41) at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$read$1$adapted(ParamsAndFeaturesReadable.scala:41) at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:19) at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:8) at org.apache.spark.ml.Pipeline$SharedReadWrite$.$anonfun$load$5(Pipeline.scala:277) at org.apache.spark.ml.MLEvents.withLoadInstanceEvent(events.scala:162) at org.apache.spark.ml.MLEvents.withLoadInstanceEvent$(events.scala:157) at org.apache.spark.ml.util.Instrumentation.withLoadInstanceEvent(Instrumentation.scala:42) at org.apache.spark.ml.Pipeline$SharedReadWrite$.$anonfun$load$4(Pipeline.scala:277) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198) at scala.collection.TraversableLike.map(TraversableLike.scala:238) at scala.collection.TraversableLike.map$(TraversableLike.scala:231) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198) at org.apache.spark.ml.Pipeline$SharedReadWrite$.$anonfun$load$3(Pipeline.scala:274) at org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191) at scala.util.Try$.apply(Try.scala:213) at org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191) at org.apache.spark.ml.Pipeline$SharedReadWrite$.load(Pipeline.scala:268) at org.apache.spark.ml.PipelineModel$PipelineModelReader.$anonfun$load$7(Pipeline.scala:356) at org.apache.spark.ml.MLEvents.withLoadInstanceEvent(events.scala:162) at org.apache.spark.ml.MLEvents.withLoadInstanceEvent$(events.scala:157) at org.apache.spark.ml.util.Instrumentation.withLoadInstanceEvent(Instrumentation.scala:42) at org.apache.spark.ml.PipelineModel$PipelineModelReader.$anonfun$load$6(Pipeline.scala:355) at org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191) at scala.util.Try$.apply(Try.scala:213) at org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191) at org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:355) at org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:349) at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadPipeline(ResourceDownloader.scala:395) at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadPipeline(ResourceDownloader.scala:389) at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader$.downloadPipeline(ResourceDownloader.scala:499) at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadPipeline(ResourceDownloader.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748)

Do you have any idea or suggestion? Thanks!

Jun 07 '21 12:06 kormoczi

Thank you for sharing, taking a closer look

Jun 07 '21 12:06 C-K-Loan

HI @kormoczi can you share which specific python code you ran to cause this issue?

Jun 08 '21 16:06 C-K-Loan

Hi @C-K-Loan Sure, this is the python code:

import nlu
input("Step #1 - nlu.load - Press Enter to proceed...")
pipe_translate_hu_en = nlu.load('hu.translate_to.en', gpu=True)
input("Step #2 - pipe_translate_hu_en.predict - Press Enter to proceed...")
translate_output = pipe_translate_hu_en.predict("Sziasztok, mi a helyzet?")
print(translate_output)
input("Step #3 - delete pipe - Press Enter to proceed...")
del pipe_translate_hu_en

The error comes up during nlu.load already.

I am not sure, but this error message ("Status: device kernel image is invalid") looks similar to another issue, which I had recently with another project. That is a PyTorch based project, and I had to match the CUDA and the torch version, and install the appropriate torch version with the CUDA extension/support. I think, the tensorflow-gpu version also should be matched with the CUDA version, but I am not sure, how to check this within NLU. I can see this jar file in the cache: com.johnsnowlabs.nlo_tensorflow-gpu_2.12-0.2.2.jar, but this version number looks a little bit strange for me... (I have tried different CUDA versions, but until now without success...)

Thanks for your help!

Jun 09 '21 07:06 kormoczi

Hi @kormoczi, can you share what CUDA version you are running? You should have Cuda 11.2 I just tested on google colab, it works fine https://colab.research.google.com/drive/1woxdvCSk7u_yhXrhCNx37L4OVC2tO47i?usp=sharing (if you click runtime in the top, you can switch to GPU runtime and test it)

Let me know if you have more trouble after installing Cuda 11.2

Jun 09 '21 07:06 C-K-Loan

Hi @C-K-Loan, Until now I have tested with CUDA 11.2.2 and CUDA 10.1. For me the problem with CUDA 11.2.2 (and 11.2 as well), that I am getting a lot of errors about missing libraries (like libcudart.so.10.1, libcudnn.so.7, etc.), that is why I though, that I should use CUDA 10.1. By the way, I am trying to install nlu with the following versions: openjdk-8-jre, pyspark==3.0.1, nlu==3.0.2. I can see, that in the colab the install (and the versions) are different, I will try to reproduce those settings on my machine...

Jun 09 '21 09:06 kormoczi

Since NLU is based on Spark NLP, GPU requirements:

Spark NLP 3.x is based on TensorFlow 2.3.1 so the requirements for GPU -- CUDA 10.1 -- cuDNN 7.x
Spark NLP 3.1 is based on TensorFlow 2.4.1 so the requirements for GPU -- CUDA 11 -- cuDNN 8.0.2

Since the latest NLU is on Spark NLP 3.x you should go with the first options. Make sure you follow TensorFlow instructions for installing/settings GPU correctly especially LD_LIBRARY_PATH env variable.

PS: As for why Google Colab with CUDA 11.x can work with Spark NLP 3.x, they simply have all the CUDA 10.1, 10.2, and 11.x dynamic files available in the path so Spark NLP finds them regardless of the default CUDA version. You should be able to see something like this:

external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-05-14 16:23:59.532205: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2021-05-14 16:23:59.534978: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2021-05-14 16:23:59.535568: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2021-05-14 16:23:59.538122: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2021-05-14 16:23:59.539839: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2021-05-14 16:23:59.545371: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2021-05-14 16:23:59.545456: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-14 16:23:59.546220: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-14 16:23:59.546892: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-05-14 16:23:59.546933: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-05-14 16:24:00.207163: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-05-14 16:24:00.207208: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0 
2021-05-14 16:24:00.207215: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N 
2021-05-14 16:24:00.207432: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-14 16:24:00.208113: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-14 16:24:00.208840: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-14 16:24:00.209546: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14766 MB memory) -> physical GPU (device: 0, name: NVIDIA Tesla V100-SXM2-16GB, pci bus id: 0000:00:04.0, compute capability: 7.0)

Also, this is a nice thread to read when GPU setup becomes tricky: https://spark-nlp.slack.com/archives/CA118BWRM/p1620933399356800

Jun 09 '21 09:06 maziyarpanahi

Sorry, I am a little bit confused right now. @C-K-Loan told I should have CUDA 11.2, @maziyarpanahi, you told I should have CUDA 10.1. Anyhow, I have tried both... With CUDA 11.2, even the libraries did not load. With CUDA 10.1, it looks like the libraries did load, but than I receive the error, I mentioned before ("Internal: CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid"). Maybe the problem is, that I am not explicitly installing / setting Tensorflow? It is installed by the nlu itself, during the first run...

Jun 09 '21 10:06 kormoczi

@maziyarpanahi By the way, how I can request access to this thread you have mentioned? When I click on the link you have provided, I just received an error: "doesn’t have an account on this workspace". Thanks!

Jun 09 '21 10:06 kormoczi

@kormoczi my bad, what @maziyarpanahi suggested is correct. The current version of NLU is based on Spark NLP 3.X, which means you need either CUDA 10.1 or cuDNN 7.x

This looks most likely like a Tensorflow installation issue.

Maybe try verifying Tensorflow has access to GPU https://stackoverflow.com/questions/38009682/how-to-tell-if-tensorflow-is-using-gpu-acceleration-from-inside-python-shell

The thread that has been posted by @maziyarpanahi is visible when you join the Slack channel https://join.slack.com/t/spark-nlp/shared_invite/zt-lutct9gm-kuUazcyFKhuGY3_0AMkxqA , it is our community with over 2000 people helping each other

Hope this helps

Jun 09 '21 10:06 C-K-Loan

@C-K-Loan Thanks, I could join the Slack channel.

I have double checked the Tensorflow install (as described on the link you have provided), with using the following python script:

import tensorflow as tf
input("Step #1 - Verify that Tensorflow loads - Press Enter to proceed...")
print(tf.__version__)
input("Step #2 - Verify that GPU is seen by Tensorflow - Press Enter to proceed...")
print('Num GPUs Available: ', len(tf.config.experimental.list_physical_devices('GPU')))
input("Step #3 - Verify that GPU is seen by Tensorflow - Press Enter to proceed...")
print('Num GPUs Available: ', len(tf.config.list_physical_devices('GPU')))

And it looks ok, this is the output:

2021-06-09 11:44:03.086382: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Step #1 - Verify that Tensorflow loads - Press Enter to proceed...
2.3.1
Step #2 - Verify that GPU is seen by Tensorflow - Press Enter to proceed...
2021-06-09 11:44:06.328576: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2021-06-09 11:44:06.387802: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:86:00.0 name: A100-PCIE-40GB computeCapability: 8.0
coreClock: 1.41GHz coreCount: 108 deviceMemorySize: 39.59GiB deviceMemoryBandwidth: 1.41TiB/s
2021-06-09 11:44:06.387848: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-06-09 11:44:06.390328: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2021-06-09 11:44:06.392879: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2021-06-09 11:44:06.393314: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2021-06-09 11:44:06.395953: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2021-06-09 11:44:06.397420: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2021-06-09 11:44:06.403439: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2021-06-09 11:44:06.406652: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
Num GPUs Available:  1
Step #3 - Verify that GPU is seen by Tensorflow - Press Enter to proceed...
Num GPUs Available:  1

But the nlu does not work, this script:

import nlu
input("Step #1 nlu.load - Press Enter to proceed...")
pipe_translate_hu_en = nlu.load('hu.translate_to.en', gpu=True)
input("Step #2 pipe.predict - Press Enter to proceed...")
translate_output = pipe_translate_hu_en.predict("Sziasztok, mi a helyzet?")
print(translate_output)
input("Step #3 del pipe - Press Enter to proceed...")
del pipe_translate_hu_en

gives the error mentioned in the beginning of this thread (the error happens during nlu.load).

But I think these are not the same Tensorflow installs, the first one is installed with pip, the second one is installed by the nlu, as a jar package... As I can see, these are the version numbers for the main jar packages installed by nlu:

spark-nlp-gpu_2.12/3.0.3/spark-nlp-gpu_2.12-3.0.3.jar
tensorflow-gpu_2.12/0.2.2/tensorflow-gpu_2.12-0.2.2.jar Are these versions look ok? Or is there a way to use other versions here?

By the way, now I have installed nlu based on the colad_setup.sh script.

The value of the LD_LIBRARY_PATH was "/usr/local/nvidia/lib:/usr/local/nvidia/lib64", I have replaced it with the following: "/usr/local/cuda/lib64", but no change either (there is no directory named /usr/local/nvidia).

Jun 09 '21 11:06 kormoczi

Hi @kormoczi sorry for the late reply.

Could you test a couple of other Tensorflow based models and see if this error occurs?

i.e. try please nlu.load('bert', gpu=True)

nlu.load('elmo', gpu=True)

nlu.load('xlnet', gpu=True)

please let me know if you get the same errors or if this only happens on translate.

This could be related to SentenceDetectorDLModel which is in the original stacktrace.

Alternatively, please try the following

pipe_translate = nlu.load('hu.translate_to.en')
pipe_translate.components
pipe_translate.components.remove(pipe_translate.components[1])
pipe_translate.predict('Hello world', output_level ='document')

This will remove the SentenceDetectorDL, which is causing the error in the pipeline for you.

Jul 17 '21 05:07 C-K-Loan

nlu nlu copied to clipboard

GPU support

nlu
nlu copied to clipboard