Mark Sammons

Results 25 comments of Mark Sammons

Can you generate a test case that illustrates this?

I don't think a TextAnnotation object should be created for such cases, because no annotations can be generated for them (at least, not with our current tooling). However, the failure...

@nitishgupta what is the use case? would a TextAnnotation with no non-whitespace text be useful in some way?

I disagree. User of the resulting TextAnnotation will likely have to check for empty views -- or, the unlucky client who is using their code as an intermediary will. How...

The last version of tokenizer that had tests -- for which we used the MASC corpus -- is here: https://gitlab-beta.engr.illinois.edu/cogcomp/illinois-tokenizer/tree/master

modified to separate the two issues. The sentences field still bothers me as it has created problems when end users accessed .sentences instead of the containing TextAnnotation, but maybe this...

Just reasserting my vote to remove this duplicative field: changing the Sentence view requires an explicit call to TextAnnotation.setSentences() to update the .sentences field; if you forget, the TextAnnotation has...

Incidentally: making the Constituents field of View a TreeMap would address an inefficiency in the current code (see https://github.com/CogComp/cogcomp-nlp/blob/master/core-utilities/src/main/java/edu/illinois/cs/cogcomp/core/datastructures/textannotation/SpanLabelView.java#L65)

They are separate issues, but changing to treemap would automatically fix the duplication problem. Given that we want to order constituents for return via View.getConstituents() anyway (and we impose ordering...

sorry, wires crossed. Was thinking about a separate issue that I think also touches on span overlap -- duplicate constituents. Agree with your proposed solution -- moving bounds check.