Richard Eckart de Castilho issues

Results 441 issues of


                                            Richard Eckart de Castilho

Extracting tagsets from HunPos models not supported

``` Extracting tagsets from HunPos models not supported ``` Original issue reported on code.google.com by `richard.eckart` on 2014-01-12 19:00:57

🐛Bug

Module-hunpos

StanfordNamedEntityRecognizer does not use existing tokenization

``` StanfordNamedEntityRecognizer does not use existing tokenization. The annotations created by it may not always be colocated with tokens! ``` Original issue reported on code.google.com by `richard.eckart` on 2013-09-16 10:31:15

🐛Bug

Module-stanfordnlp

POS tagset extracted from French maltparser model looks very strange

``` POS tagset extracted from French maltparser model looks very strange: Tagset [null] for layer [de.tudarmstadt.ukp.dkpro.core.api.lexmorph.type.pos.POS] contains [39] tags: /CC /P /PONCT 4/DET ADJ ADJWH ADV ADVWH CC CLO CLR...

🐛Bug

Module-maltparser

XmlReader assumes all elements to be on the second level

``` This seems a bit to strict for most purposes. Should be generalized better. ``` Original issue reported on code.google.com by `torsten.zesch` on 2013-08-05 11:20:38

🐛Bug

Reported dependency tagset may be incomplete

``` As John Bauer commented regarding the fetching of the dependency relation tagset: One thing worth noting is that the dependencies list can actually change over time as it comes...

🐛Bug

Module-stanfordnlp

Migrate all readers from the JWPL Parser to SWEBLE

``` In the current trunk, JWPL has changed from using its own parser to using the SWEBLE parser. The old parser is still available in its own module and is...

🐛Bug

Module-io.jwpl

Stem and Lemma should be defined in LexMorph API

``` Currently Stem and Lemma are defined in the Segmentation API. Arguably, they don't have anything to do with that API other than being used as features in Token. The...

🐛Bug

Module-api.lexmorph

Automatically used standard stopword lists depending on document language

``` Snowball comes with a set of standard stopword lists. Per default the tagger should detect which language a document has and use the standard list for that language. It...

🐛Bug

Module-stopwordremover

Some NER tools do not mark multi-word NEs

Some NER tools such as the `CoreNlpNamedEntityRecognizer` mark every token individually as a NE instead of creating a multi-token NE. ![2018-05-23_11-43-44](https://user-images.githubusercontent.com/1410238/40416821-92ab9c90-5e7e-11e8-9ec4-7e65999fc1a8.png) IMHO the default behavior should be that NEs with...

🐛Bug

Module-opennlp

Module-cogroo

Module-lingpipe

Module-corenlp

Module-nlp4j

Module-lbj

CoNLL 2008 reader does not handle non-token lines

The CoNLL 2008 format/data seems to include some oddities where the token text column is unset, e.g. hyphenated words: ``` 14 entry-level entry-level NN _ entry entry NN 16 HMOD...

🐛Bug

Module-io.conll