cogcomp-nlp
cogcomp-nlp copied to clipboard
CogComp's Natural Language Processing Libraries and Demos: Modules include lemmatizer, ner, pos, prep-srl, quantifier, question type, relation-extraction, similarity, temporal normalizer, tokenizer, t...
Hello. TL;DR; StatefulTokenizer tokenizes the date "10/23/2018" as [ "10", "/", "23/2018" ] whereas IllinoisTokenizer (which seems to be deprecated) keeps it as a single token [ "10/23/2018" ]. Longer...
In [here](https://github.com/CogComp/cogcomp-nlp/blob/master/edison/src/main/java/edu/illinois/cs/cogcomp/edison/features/factory/WordFeatureExtractorFactory.java#L239-L253) each time that it wants to add features it loads a resource. Ideally the resource should be loaded only once the first time we call it.
This constructor is not intended for use -- the version with ResourceManager must be used or the initialize step won't work. Error message is logged but no exception thrown: https://github.com/CogComp/cogcomp-nlp/blob/master/corpusreaders/src/main/java/edu/illinois/cs/cogcomp/nlp/corpusreaders/AnnotationReader.java#L35...
@nitishgupta suggested that we add a functionality to `TextAnnotation` to convert document-level offsets (say constituent start or end) to sentence-level offset (i.e., what is its order from the beginning of...
The documentation in core utils goes back a few years. One person has to read and correct deprecated / updated syntax.
Quoting from Dan: "I wonder if it makes sense for us to add some words about stemming in our Lemmatizer documentation; at this point if people search for a stemmer...
I had experiences with tokenizer failing on non-UTF-8 characters. (e.g. "�" below): ```scala val text = "Rendering software which cannot process a Unicode character appropriately often displays it as an...
So I cleaned my caches on morgoth: ``` - ~/.m2 - ~/.cogcomp-datastore - ~/.cogcomp-datastore-tmp ``` and now I see a test failing in NER: http://morgoth.cs.illinois.edu:5800/viewLog.html?buildId=356&tab=buildResultsDiv&buildTypeId=CogcompNlp_Build The reason must be something...
I am trying to replicate the experiments in the paper. Importance of Semantic Representation: Dataless Classification https://cogcomp.org/page/publication_view/178 However, I cannot find the exact definition of the experiment "binary classification with...
- add a type to the feature extractor.