dkpro-core icon indicating copy to clipboard operation
dkpro-core copied to clipboard

Some parsers fail when no POS annotations are present

Open reckart opened this issue 10 years ago • 3 comments

I haven't checked all, but at least MstParser and ClearNlpDependencyParsers fail with hard to interpret deep exceptions when no POS annotations are present.

POS annotations are annotated as required type, but as we never really check that so far, no warning is issued.

The problem is especially problematic as StanfordParser also works without POS annotations being present and simply replacing one parser for another in a setup that worked before is confusing.

Could all annotator with type capabilities actually check whether these levels are present? Or is this to expensive to have it automatically enable e.g. in the implBase?

Original issue reported on code.google.com by torsten.zesch on 2014-08-06 16:06:42

reckart avatar May 12 '15 22:05 reckart

Consider you have a document containing only stopwords. 
A segmenter creates tokens for these.
A stopword remover removes all tokens.
A parser runs and finds no tokens.

This parser should not fail, it should simply do nothing.

Input capabilities do not mean that a component must fail if no annotations of the
given type are present. It just means that the component may use this information -
at least that is my understanding given what I show-cased above.

We should fail with a proper message if illegal combinations of annotations are encountered,
e.g. if a parser finds a Token that has no POS tag or if it finds a POS tag that has
no value.

We might also want to fail in cases where we know that a model makes use of some information
that is not present in the CAS, e.g. MaltParser fails if it finds that the model it
uses needs lemma information but there is no lemma information available on a token
(this can be turned of too by setting PARAM_IGNORE_MISSING_FEATURES to true).

So yes, we should handle this better.

But I believe using type capabilities is not going to take use anywhere.

Original issue reported on code.google.com by richard.eckart on 2014-08-06 17:06:07

reckart avatar May 12 '15 22:05 reckart

>> This parser should not fail, it should simply do nothing.

but it would be nice if the parser would issue a message that it did nothing and why
it did nothing - I recently had a similar issue with the StanfordParser where I read
in an annotated corpus, but the Sentence annotations were missing.
It took me a while to realize that
1 the parser did nothing
2 it did nothing because of missing Sentence annotations

Original issue reported on code.google.com by eckle.kohler on 2014-08-06 17:16:42

reckart avatar May 12 '15 22:05 reckart

ok Richard is of course right.
The problem is there, only my solution is not the best.

The parsers should at least output a warning if the rely on POS, but there is no POS
annotations present.
Currently it not even silently does nothing, but throws an exception which is definitely
not how this should behave :)

Original issue reported on code.google.com by torsten.zesch on 2014-08-06 17:21:57

reckart avatar May 12 '15 22:05 reckart