CoreNLP
CoreNLP copied to clipboard
StanfordNLP: Unable to identify Date with 7-class-ner
I'm using stanfordNLP to get date entities from text. Here's the code that i tried:-
import java.io.IOException;
import java.util.List;
import edu.stanford.nlp.ie.AbstractSequenceClassifier;
import edu.stanford.nlp.ie.crf.CRFClassifier;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.ling.CoreLabel;
public class StanfordNLP_POC
{
public static void main(String[] args) throws IOException
{
// TODO Auto-generated method stub
String classifierPath = "src//main//resources//classifiers//english.muc.7class.distsim.crf.ser.gz";
String inputString = "Appointment Facility: ABC Medicine Clinic 05/07/2020 Progress Notes: Niel Armstrong, DO Current Medications Reason for Appointment";
AbstractSequenceClassifier classifier = CRFClassifier.getClassifierNoExceptions(classifierPath);
List<List<CoreLabel>> out = classifier.classify(inputString);
System.out.println(out.toString());
for (List<CoreLabel> sentence : out)
{
for (CoreLabel word : sentence)
{
if (word.getString(CoreAnnotations.AnswerAnnotation.class).equals("O"))
continue;
System.out.println(word.word() + " = " + word.get(CoreAnnotations.AnswerAnnotation.class));
}
}
}
}
I didn't get why it's not extracting Date even though it's very clearly identifiable in the text.
Also when trying with pipeline it extracts date but takes a bit longer to do so.
The statistical model doesn't have any experience recognizing that particular date format.
If you run CoreNLP with only the models you can see that it doesn't recognize it there, either:
java edu.stanford.nlp.pipeline.StanfordCoreNLP -ner.statisticalOnly
CoreNLP uses some hard coded expressions which recognize dates in that format.