CoreNLP icon indicating copy to clipboard operation
CoreNLP copied to clipboard

StanfordNLP: Unable to identify Date with 7-class-ner

Open tarunshah opened this issue 5 years ago • 1 comments

I'm using stanfordNLP to get date entities from text. Here's the code that i tried:-

import java.io.IOException;
import java.util.List;
import edu.stanford.nlp.ie.AbstractSequenceClassifier;
import edu.stanford.nlp.ie.crf.CRFClassifier;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.ling.CoreLabel;

public class StanfordNLP_POC
{

    public static void main(String[] args) throws IOException
    {
        // TODO Auto-generated method stub
        String classifierPath = "src//main//resources//classifiers//english.muc.7class.distsim.crf.ser.gz";

        String inputString = "Appointment Facility: ABC Medicine Clinic 05/07/2020 Progress Notes: Niel Armstrong, DO Current Medications Reason for Appointment";

        AbstractSequenceClassifier classifier = CRFClassifier.getClassifierNoExceptions(classifierPath);

        List<List<CoreLabel>> out = classifier.classify(inputString);

        System.out.println(out.toString());

        for (List<CoreLabel> sentence : out)
        {
            for (CoreLabel word : sentence)
            {

                if (word.getString(CoreAnnotations.AnswerAnnotation.class).equals("O"))
                    continue;
                System.out.println(word.word() + " = " + word.get(CoreAnnotations.AnswerAnnotation.class));
            }
        }

    }

}

I didn't get why it's not extracting Date even though it's very clearly identifiable in the text.

Also when trying with pipeline it extracts date but takes a bit longer to do so.

tarunshah avatar Jul 22 '20 13:07 tarunshah

The statistical model doesn't have any experience recognizing that particular date format.

If you run CoreNLP with only the models you can see that it doesn't recognize it there, either:

java edu.stanford.nlp.pipeline.StanfordCoreNLP -ner.statisticalOnly

CoreNLP uses some hard coded expressions which recognize dates in that format.

AngledLuffa avatar Jul 22 '20 15:07 AngledLuffa