spark-nlp
spark-nlp copied to clipboard
org.tensorflow.exceptions.TFInvalidArgumentException: indices[0,11] = 28937 is not in [0, 21128)
Is there an existing issue for this?
- [X] I have searched the existing issues and did not find a match.
Current Behavior
BertEmbeddings.pretrained()
can load successfully.
But when I run BertEmbeddings.pretrained("bert_embeddings_chinese_roberta_wwm_ext","zh")
, I get the exception:
2024-05-24 17:23:24.614208: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Exception in thread "main" org.tensorflow.exceptions.TFInvalidArgumentException: indices[0,11] = 28937 is not in [0, 21128)
[[{{node bert/embeddings/Gather}}]]
at org.tensorflow.internal.c_api.AbstractTF_Status.throwExceptionIfNotOK(AbstractTF_Status.java:87)
at org.tensorflow.Session.run(Session.java:850)
at org.tensorflow.Session.access$300(Session.java:82)
at org.tensorflow.Session$Runner.runHelper(Session.java:552)
at org.tensorflow.Session$Runner.runNoInit(Session.java:499)
at org.tensorflow.Session$Runner.run(Session.java:495)
at com.johnsnowlabs.ml.ai.Bert.tag(Bert.scala:176)
at com.johnsnowlabs.ml.ai.Bert.sessionWarmup(Bert.scala:77)
at com.johnsnowlabs.ml.ai.Bert.<init>(Bert.scala:86)
at com.johnsnowlabs.nlp.embeddings.BertEmbeddings.setModelIfNotSet(BertEmbeddings.scala:267)
at com.johnsnowlabs.nlp.embeddings.ReadBertDLModel.readModel(BertEmbeddings.scala:432)
at com.johnsnowlabs.nlp.embeddings.ReadBertDLModel.readModel$(BertEmbeddings.scala:427)
at com.johnsnowlabs.nlp.embeddings.BertEmbeddings$.readModel(BertEmbeddings.scala:492)
at com.johnsnowlabs.nlp.embeddings.ReadBertDLModel.$anonfun$$init$$1(BertEmbeddings.scala:444)
at com.johnsnowlabs.nlp.embeddings.ReadBertDLModel.$anonfun$$init$$1$adapted(BertEmbeddings.scala:444)
at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$onRead$1(ParamsAndFeaturesReadable.scala:50)
at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$onRead$1$adapted(ParamsAndFeaturesReadable.scala:49)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.onRead(ParamsAndFeaturesReadable.scala:49)
at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$read$1(ParamsAndFeaturesReadable.scala:61)
at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$read$1$adapted(ParamsAndFeaturesReadable.scala:61)
at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:38)
at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:24)
at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadModel(ResourceDownloader.scala:515)
at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadModel(ResourceDownloader.scala:507)
at com.johnsnowlabs.nlp.HasPretrained.pretrained(HasPretrained.scala:44)
at com.johnsnowlabs.nlp.HasPretrained.pretrained$(HasPretrained.scala:41)
at com.johnsnowlabs.nlp.embeddings.BertEmbeddings$.com$johnsnowlabs$nlp$embeddings$ReadablePretrainedBertModel$$super$pretrained(BertEmbeddings.scala:492)
at com.johnsnowlabs.nlp.embeddings.ReadablePretrainedBertModel.pretrained(BertEmbeddings.scala:418)
at com.johnsnowlabs.nlp.embeddings.ReadablePretrainedBertModel.pretrained$(BertEmbeddings.scala:417)
at com.johnsnowlabs.nlp.embeddings.BertEmbeddings$.pretrained(BertEmbeddings.scala:492)
at com.johnsnowlabs.nlp.embeddings.BertEmbeddings$.pretrained(BertEmbeddings.scala:492)
at com.johnsnowlabs.nlp.HasPretrained.pretrained(HasPretrained.scala:47)
at com.johnsnowlabs.nlp.HasPretrained.pretrained$(HasPretrained.scala:47)
at com.johnsnowlabs.nlp.embeddings.BertEmbeddings$.com$johnsnowlabs$nlp$embeddings$ReadablePretrainedBertModel$$super$pretrained(BertEmbeddings.scala:492)
at com.johnsnowlabs.nlp.embeddings.ReadablePretrainedBertModel.pretrained(BertEmbeddings.scala:415)
at com.johnsnowlabs.nlp.embeddings.ReadablePretrainedBertModel.pretrained$(BertEmbeddings.scala:414)
at com.johnsnowlabs.nlp.embeddings.BertEmbeddings$.pretrained(BertEmbeddings.scala:492)
at com.algo.recom.article_recommender.v20240511.test_spark_nlp$.main(test_spark_nlp.scala:7)
at com.algo.recom.article_recommender.v20240511.test_spark_nlp.main(test_spark_nlp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Expected Behavior
Download model successfully.
Steps To Reproduce
import com.johnsnowlabs.nlp.embeddings.BertEmbeddings
BertEmbeddings.pretrained("bert_embeddings_chinese_roberta_wwm_ext","zh")
spark-submit :
#!/bin/bash
jar_file="./article_recommender_spark3-2.0-SNAPSHOT.jar"
class_name="com.algo.recom.article_recommender.v20240511.test_spark_nlp"
/opt/spark3/bin/spark-submit \
--name action_sequence_123 \
--master local[1] \
--files /opt/spark3/conf/hive-site.xml \
--class $class_name \
--jars hdfs:///apps/recommend/models/jars/xueyuan/mzreader/spark-nlp-assembly-5.3.3.jar \
$jar_file
Spark NLP version and Apache Spark
CentOS Linux release 8.4.2105 spark version 2.2.1 Scala version 2.11.8 java version 1.8.0_144 sparknlp : I use the Fat JAR downloaded from https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-assembly-5.3.3.jar.
Confirm CPU instructions(AVX2 AVX512F FMA):
lscpu | grep -i -e AVX512F -i -e AVX2 -i -e FMA
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm avx512f rdseed adx smap clflushopt clwb avx512cd xsaveopt xsavec xgetbv1 arat pku ospke