CoreNLP
CoreNLP copied to clipboard
CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.
The new Spanish models trained on LDC data contain a few bogus POS tags. I call them "bogus" because they only show up a few times in the corpus, and...
Hi All, Thanks for the great software. I would like to ask you the following please. When training specific relations to be extracted from custom Entity types, using the Relation...
@manning reported second-hand that character offset annotations seem to be wrong for Spanish NER output. Investigate.
I try next simple code ``` java String template = "[]|[]{2}"; TokenSequencePattern pattern = TokenSequencePattern.compile(template); Annotation document = new Annotation("word1 word2"); new TokenizerAnnotator(false, "en").annotate(document); List tokens = document.get(CoreAnnotations.TokensAnnotation.class); if (pattern.getMatcher(tokens).matches())...
In Mention.java, within the `edu.stanford.nlp.hcoref.data` package, I'm noticing some inconsistencies with how the `ner` types of "PERSON" and "PERCENT" are handled. As far as I could tell, [this page](http://nlp.stanford.edu/software/CRF-NER.shtml) details...
With base units supplied here https://github.com/stanfordnlp/CoreNLP/blob/f569983c8ad4e7890139b77775865cce1b82d4dc/src/edu/stanford/nlp/ie/qe/rules/units.txt meter, kilogram, liter get extracted regardless if numeric quantity and type are collapsed or separated by space. 10m, 10 m, 3kg 3 kg. Pretty...
`TimeExpressionExtractorFactory.isDefaultExtractorPresent()` check if the class for the default time expression extractor is present (i.e. `edu.stanford.nlp.time.TimeExpressionExtractorImpl` which is in the CoreNLP jar and thus likely always present). It doesn't check though...
Hi, in the context of dkpro's StanfordCoreferenceResolver, I found the following problem: Stanford CoreNLP (v3.4.1) seems to plan to make changes at IndexedWord: word() and value() both exist but according...
A common occurrence in English text around the world, especially in news articles, is use of the following convention (sourced from [wikipedia](https://en.wikipedia.org/wiki/Quotation_marks_in_English#Quotations_and_speech)): > The convention in English is to give...
Hi, While running unit tests on some code that uses SUTime, we noticed that all tests passed with Java 1.8 but that one failed with Java 11. In both cases...