Mark Sammons

Results 25 comments of Mark Sammons

this is really a tokenization/sentence splitter issue: sentence annotator relies on the boundaries that the tokenizer provides.

We have some cleanup code for this kind of problem: https://github.com/CogComp/cogcomp-nlp/blob/master/core-utilities/src/main/java/edu/illinois/cs/cogcomp/core/utilities/TextCleanerStringTransformation.java https://github.com/CogComp/cogcomp-nlp/blob/master/core-utilities/src/main/java/edu/illinois/cs/cogcomp/core/utilities/StringTransformationCleanup.java If these don't cover such cases, this is where the fixes should be added. We could, by default,...

should be OK: amended json deserializer would simply ignore the additional value. Note that this json redundancy directly reflects a redundancy in the TextAnnotation/View data structures.

@mayhewsw , there were ways to add constituents that sidestepped the overlap check. This issue is intended to fix the problem.

@ChaseDuncan looks like CI build is failing -- please take a look...

I think it is reasonable to change any code that violates the assumption by changing it to explicitly allow overlapping constituents -- since that is going on anyway. For posterity,...

@Slash0BZ I think the mention view should allow overlap, and that you should provide a utility method in the relevant code that verifies no constraints are violated.

I agree that we should replace cogcomp's stringutils with the apache commons library. Probably a fair number of uses throughout the package, though.

@cowchipkid this was in the context of using NER; I'm pretty sure your recent changes are relevant/might have solved this issue. Any thoughts?