spark-nlp
spark-nlp copied to clipboard
Wrong Documentation example
Hello I to get working POS following example on https://nlp.johnsnowlabs.com/2021/03/23/pos_snk_sk.html
import sparknlp
from sparknlp.base import *
from sparknlp.annotator import *
from sparknlp.training import *
from pyspark.ml import Pipeline
document_assembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentence_detector = SentenceDetector()\
.setInputCols(["document"])\
.setOutputCol("sentence")
pos = PerceptronModel.pretrained("pos_snk", "sk")\
.setInputCols(["document", "token"])\
.setOutputCol("pos")
pipeline = Pipeline(stages=[
document_assembler,
sentence_detector,
posTagger
])
example = spark.createDataFrame([['Potom Maju nežne pohladila po hlávke a vraví : Spoznáš krásny veľký svet , Maja , hrejivé slniečko a nádherné lúky plné kvetov .']], ["text"])
result = pipeline.fit(example).transform(example)
It fails with error: NameError: name 'posTagger' is not defined
When rename pos
to posTagger
(pposTaggeros = PerceptronModel.pretrained("pos_snk", "sk")\
) I receive another error:
IllegalArgumentException: requirement failed: Wrong or missing inputCols annotators in REGEX_TOKENIZER_6249f09e1595.
Current inputCols: sentence. Dataset's columns:
(column_name=text,is_nlp_annotator=false)
(column_name=document,is_nlp_annotator=true,type=document).
Make sure such annotators exist in your pipeline, with the right output names and that they have following annotator types: document
As far as I see the same problem is for other languages/ POS models python examples.
Thanks for reporting this, they forgot to add Tokenizer()
annotator to the example codes. I'll ask someone to go through those models and add the missing Tokenizer.
@ahmedlone127 Could you please have a look at models here (POS especially) and add the missing Tokenizer to both Python and Scala examples: https://github.com/JohnSnowLabs/spark-nlp/tree/master/docs/_posts/dcecchini
Thank you for quick response. I confirm that these modified examples works:
import sparknlp
from sparknlp.base import *
from sparknlp.annotator import *
from pyspark.ml import Pipeline
spark = sparknlp.start()
document_assembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentence_detector = SentenceDetector()\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = Tokenizer()\
.setInputCols("sentence")\
.setOutputCol("token")
posTagger = PerceptronModel.pretrained("pos_snk", "sk")\
.setInputCols(["document", "token"])\
.setOutputCol("pos")
pipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, posTagger])
example = spark.createDataFrame([[
'Potom Maju nežne pohladila po hlávke a vraví : Spoznáš krásny veľký svet , Maja , hrejivé slniečko a nádherné lúky plné kvetov .'
]], ["text"])
result = pipeline.fit(example).transform(example)
result.select("token.result", "pos.result").show(truncate=80)