spark-corenlp Constituency Parsing ?

I can see that there is a function defined for dependency parsing depparse. However I can't see if Constituency Parsing parse in the list of functions. Is there any way I can get the constituency parsing ?

Feb 01 '17 01:02 shopuz

I'm wondering this too!

It would be an additional user defined function in the function file or whatever file you're working in (as long as you have all of the necessary import statements).

def parse = udf { sentence: String =>
    new Sentence(sentence).parse().asScala.map(_.toString).mkString(" ")
}

and you would use it as

val input = Seq(
    (1, "Stanford is located in California. There are sometimes mountain lions on campus.")
).toDF("id", "quote")

val output = input.select(col("quote"), explode(ssplit(col("quote"))).as("sent")).select(col("quote"), col("sent"), parse(col("sent")).as("parse"))

output.show()

(Edited this comment to be more correct after I played around with it in spark-shell.)

Aug 02 '18 08:08 lucy3

Thanks @lucy3 I tried and run bellow code in sparkshell, the output is a little better:

import java.util.Properties

import scala.collection.JavaConverters._

import edu.stanford.nlp.ling.CoreAnnotations
import edu.stanford.nlp.neural.rnn.RNNCoreAnnotations
import edu.stanford.nlp.pipeline.{Annotation, CleanXmlAnnotator, StanfordCoreNLP, TokenizerAnnotator}
import edu.stanford.nlp.pipeline.CoreNLPProtos.Sentiment
import edu.stanford.nlp.sentiment.SentimentCoreAnnotations
import edu.stanford.nlp.simple.{Document, Sentence}
import edu.stanford.nlp.util.Quadruple
import org.apache.spark.sql.functions.udf

import org.apache.spark.sql.functions._
import com.databricks.spark.corenlp.functions._

def parse = udf { sentence: String => 
  new Sentence(sentence).parse().pennString().replace("\n", "")
}

and similar to @lucy3 it can be used as:

val input = Seq(
    (1, "Stanford is located in California. There are sometimes mountain lions on campus.")
).toDF("id", "quote")
val output = input.select(col("quote"), explode(ssplit(col("quote"))).as("sent")).select(col("quote"), col("sent"), parse(col("sent")).as("parse"))
output.show()

Apr 22 '21 14:04 phuongnm94

spark-corenlp spark-corenlp copied to clipboard

Constituency Parsing ?

spark-corenlp
spark-corenlp copied to clipboard