GreynirEngine
GreynirEngine copied to clipboard
Is it possible to access terminals for unparsed text?
I'm trying to render the original sentences I use Greynir to parse verbatim as html, while inserting the additional data (e.g. lemmas, parts of speech) into the html as well. However, it's not clear if all of the original data is recoverable from the results of Greynir, for instance, if I have a sentence with multiple spaces and use tidy_text
they get reduced to a single one, and using terminals
doesn't show spaces at all.
For comparison, spacy lets you recover the input text from its output. Is there a way to do this, so I can iterate over terminals or unparsed text together? I did see that periods are stored as a terminal, with no category, so presumably raw terminals could be stored the same way, but I'm assuming from tidy_text
s behaviour the data might not be stored at all.
I am looking at trying to insert missing context back into the results as a workaround, I'm just curious if there's any methods/attributes that get me the info I need already.