spark-corenlp icon indicating copy to clipboard operation
spark-corenlp copied to clipboard

Constituency Parsing ?

Open shopuz opened this issue 8 years ago • 2 comments

I can see that there is a function defined for dependency parsing depparse. However I can't see if Constituency Parsing parse in the list of functions. Is there any way I can get the constituency parsing ?

shopuz avatar Feb 01 '17 01:02 shopuz

I'm wondering this too!

It would be an additional user defined function in the function file or whatever file you're working in (as long as you have all of the necessary import statements).

def parse = udf { sentence: String =>
    new Sentence(sentence).parse().asScala.map(_.toString).mkString(" ")
}

and you would use it as

val input = Seq(
    (1, "Stanford is located in California. There are sometimes mountain lions on campus.")
).toDF("id", "quote")

val output = input.select(col("quote"), explode(ssplit(col("quote"))).as("sent")).select(col("quote"), col("sent"), parse(col("sent")).as("parse"))

output.show()

(Edited this comment to be more correct after I played around with it in spark-shell.)

lucy3 avatar Aug 02 '18 08:08 lucy3

Thanks @lucy3 I tried and run bellow code in sparkshell, the output is a little better:

import java.util.Properties

import scala.collection.JavaConverters._

import edu.stanford.nlp.ling.CoreAnnotations
import edu.stanford.nlp.neural.rnn.RNNCoreAnnotations
import edu.stanford.nlp.pipeline.{Annotation, CleanXmlAnnotator, StanfordCoreNLP, TokenizerAnnotator}
import edu.stanford.nlp.pipeline.CoreNLPProtos.Sentiment
import edu.stanford.nlp.sentiment.SentimentCoreAnnotations
import edu.stanford.nlp.simple.{Document, Sentence}
import edu.stanford.nlp.util.Quadruple
import org.apache.spark.sql.functions.udf

import org.apache.spark.sql.functions._
import com.databricks.spark.corenlp.functions._

def parse = udf { sentence: String => 
  new Sentence(sentence).parse().pennString().replace("\n", "")
}

and similar to @lucy3 it can be used as:

val input = Seq(
    (1, "Stanford is located in California. There are sometimes mountain lions on campus.")
).toDF("id", "quote")
val output = input.select(col("quote"), explode(ssplit(col("quote"))).as("sent")).select(col("quote"), col("sent"), parse(col("sent")).as("parse"))
output.show()

phuongnm94 avatar Apr 22 '21 14:04 phuongnm94