python-gatenlp
                                
                                
                                
                                    python-gatenlp copied to clipboard
                            
                            
                            
                        Add util func: virtual text based on ann features
As in the stringannotation plugin but more flexible.
One or more texts, based on split anns, insert sep chars or not, insert if, insert from lambda
This should return the text and the offset mapping. It should be possible to add placeholder text for specific annotation types, so instead of retrieving the text from the feature or underlying document, just add some constant text (e.g. a single space) if the annotation is encountered.
Possible signature: text, offsetmap = Document.virtualtext(...) with parameters:
- annset: annotations from that set are used
 - anndescs: a list of details about annotations to use where each detail is:
- string: type of the annotation, use underlying document text
 - dictionary with keys type, feature, text: if feature specified use str(feature). If text is specified use that if feature not specified or None
 - the list represents the priority of which annotation type to use, if there are several of the same type, an arbitrary one is used.
 - after an annotation is used, the next one is retrieved from the position after the end of the used one
 - if the annotation that would get used ends after the within range, it is not used and processing ends
 
 - within: span or annotation from any set, only annotations within that span are processed
 - gap: string to insert between segments without annotation, if None, nothing is inserted
 
The offset map is just an array[int] with the same length as the returned text, containing the document offset for each text offset.