Richard Eckart de Castilho

Results 441 issues of Richard Eckart de Castilho

``` Extracting tagsets from HunPos models not supported ``` Original issue reported on code.google.com by `richard.eckart` on 2014-01-12 19:00:57

🐛Bug
Module-hunpos

``` StanfordNamedEntityRecognizer does not use existing tokenization. The annotations created by it may not always be colocated with tokens! ``` Original issue reported on code.google.com by `richard.eckart` on 2013-09-16 10:31:15

🐛Bug
Module-stanfordnlp

``` POS tagset extracted from French maltparser model looks very strange: Tagset [null] for layer [de.tudarmstadt.ukp.dkpro.core.api.lexmorph.type.pos.POS] contains [39] tags: /CC /P /PONCT 4/DET ADJ ADJWH ADV ADVWH CC CLO CLR...

🐛Bug
Module-maltparser

``` This seems a bit to strict for most purposes. Should be generalized better. ``` Original issue reported on code.google.com by `torsten.zesch` on 2013-08-05 11:20:38

🐛Bug

``` As John Bauer commented regarding the fetching of the dependency relation tagset: One thing worth noting is that the dependencies list can actually change over time as it comes...

🐛Bug
Module-stanfordnlp

``` In the current trunk, JWPL has changed from using its own parser to using the SWEBLE parser. The old parser is still available in its own module and is...

🐛Bug
Module-io.jwpl

``` Currently Stem and Lemma are defined in the Segmentation API. Arguably, they don't have anything to do with that API other than being used as features in Token. The...

🐛Bug
Module-api.lexmorph

``` Snowball comes with a set of standard stopword lists. Per default the tagger should detect which language a document has and use the standard list for that language. It...

🐛Bug
Module-stopwordremover

Some NER tools such as the `CoreNlpNamedEntityRecognizer` mark every token individually as a NE instead of creating a multi-token NE. ![2018-05-23_11-43-44](https://user-images.githubusercontent.com/1410238/40416821-92ab9c90-5e7e-11e8-9ec4-7e65999fc1a8.png) IMHO the default behavior should be that NEs with...

🐛Bug
Module-opennlp
Module-cogroo
Module-lingpipe
Module-corenlp
Module-nlp4j
Module-lbj

The CoNLL 2008 format/data seems to include some oddities where the token text column is unset, e.g. hyphenated words: ``` 14 entry-level entry-level NN _ entry entry NN 16 HMOD...

🐛Bug
Module-io.conll