CoreNLP icon indicating copy to clipboard operation
CoreNLP copied to clipboard

CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.

Results 152 CoreNLP issues
Sort by recently updated
recently updated
newest added

The new Spanish models trained on LDC data contain a few bogus POS tags. I call them "bogus" because they only show up a few times in the corpus, and...

bug
multilingual

Hi All, Thanks for the great software. I would like to ask you the following please. When training specific relations to be extracted from custom Entity types, using the Relation...

enhancement
kbp

@manning reported second-hand that character offset annotations seem to be wrong for Spanish NER output. Investigate.

bug
tokenize

I try next simple code ``` java String template = "[]|[]{2}"; TokenSequencePattern pattern = TokenSequencePattern.compile(template); Annotation document = new Annotation("word1 word2"); new TokenizerAnnotator(false, "en").annotate(document); List tokens = document.get(CoreAnnotations.TokensAnnotation.class); if (pattern.getMatcher(tokens).matches())...

bug
tokensregex

In Mention.java, within the `edu.stanford.nlp.hcoref.data` package, I'm noticing some inconsistencies with how the `ner` types of "PERSON" and "PERCENT" are handled. As far as I could tell, [this page](http://nlp.stanford.edu/software/CRF-NER.shtml) details...

bug
coref

With base units supplied here https://github.com/stanfordnlp/CoreNLP/blob/f569983c8ad4e7890139b77775865cce1b82d4dc/src/edu/stanford/nlp/ie/qe/rules/units.txt meter, kilogram, liter get extracted regardless if numeric quantity and type are collapsed or separated by space. 10m, 10 m, 3kg 3 kg. Pretty...

bug
tokenize
ner

`TimeExpressionExtractorFactory.isDefaultExtractorPresent()` check if the class for the default time expression extractor is present (i.e. `edu.stanford.nlp.time.TimeExpressionExtractorImpl` which is in the CoreNLP jar and thus likely always present). It doesn't check though...

bug
ner
sutime

Hi, in the context of dkpro's StanfordCoreferenceResolver, I found the following problem: Stanford CoreNLP (v3.4.1) seems to plan to make changes at IndexedWord: word() and value() both exist but according...

bug

A common occurrence in English text around the world, especially in news articles, is use of the following convention (sourced from [wikipedia](https://en.wikipedia.org/wiki/Quotation_marks_in_English#Quotations_and_speech)): > The convention in English is to give...

Hi, While running unit tests on some code that uses SUTime, we noticed that all tests passed with Java 1.8 but that one failed with Java 11.  In both cases...