python-gatenlp
python-gatenlp copied to clipboard
Python text processing, pattern matching, and NLP framework
Dear All, I was able to configure a pipeline to identify aliases in gate ui, but I have not able to find an example of how to configure gazetteers to...
currently in Gate editor the orthomatcher is included by default, so I can identify aliases like: ibm -> International Business Machines. Thus, I wonder if such functionality is currently available...
**Is your feature request related to a problem? Please describe.** I currently can use a text file with a list of terms to create a StringGazetteer, which I can use...
**Describe the bug** Getters don't fail silently if named group doesn't exist because _get_match returns None and then .get() is called on None: https://github.com/GateNLP/python-gatenlp/blob/29f5390ca850143f02cfefac3b635835067eae3b/gatenlp/pam/pampac/getters.py#L230 **To Reproduce** ``` # Add annotation...
**Is your feature request related to a problem? Please describe.** I would like to clean up annotations that are used to define more specific things. Complex patterns of variable repeating...
https://github.com/GateNLP/python-gatenlp/blob/1b2d974f5d4744419ec0ce88ac5c062405e1c0c8/gatenlp/pam/pampac/pampac_parsers.py#L607 Current: annstocheck = useset.startin_ge(result.span.end) Meant to be: annstocheck = useset.start_ge(result.span.end) ?? or annstocheck = useset.start_min_ge(result.span.end) ??
Especially also checking the impact of the owning set field! Especially: * remove(annotationinstance): * the instance must be directly included in the set, based on the hash function used which...
See * https://gatenlp.github.io/python-gatenlp/formats * https://gatenlp.github.io/gateplugin-Format_Bdoc/ * https://gatenlp.github.io/gateplugin-Format_Bdoc/bdoc_document.html
* need to find out which dependencies are absolutely needed for the base functionality (including pampac) * may need to install more dependencies for base installation (default extra) than we...
This is very complex code and we need lots of unit tests to test just a fraction of what could be done with it.