apertium-python icon indicating copy to clipboard operation
apertium-python copied to clipboard

terminate called after throwing an instance of 'Exception' what(): Error: Malformed input stream.

Open eagad opened this issue 4 years ago • 13 comments

I am running apertium analyzer from a python script. I get this exception that terminates the script immediately. I am not able to catch it inside python, it seems like it's happenning in c++ and doesn't get handle, how can I handle it?

terminate called after throwing an instance of 'Exception' what(): Error: Malformed input stream. Aborted (core dumped)

To replicate the issue:

import apertium apertium.analyze('en', 'Hi/Hello')

eagad avatar Mar 08 '21 20:03 eagad

It's because you have an unescaped / in your input string.

mr-martian avatar Mar 08 '21 20:03 mr-martian

How would you escape it?

apertium.analyze('en', r'Hi/Hello')

throws the same exception

eagad avatar Mar 08 '21 20:03 eagad

'Hi\\/Hello'

the escape has to get to the underlying pipe

mr-martian avatar Mar 08 '21 20:03 mr-martian

This still didn't work

apertium.analyze('en', 'Hi\\/Hello')

terminate called after throwing an instance of 'Exception' what(): Error: Malformed input stream. Aborted (core dumped)

Also, is there specific list for characters that need to be escaped?

eagad avatar Mar 08 '21 20:03 eagad

https://wiki.apertium.org/wiki/Apertium_stream_format

mr-martian avatar Mar 08 '21 21:03 mr-martian

This still didn't work

apertium.analyze('en', 'Hi\\/Hello')

terminate called after throwing an instance of 'Exception' what(): Error: Malformed input stream. Aborted (core dumped)

Also, is there specific list for characters that need to be escaped?

Try adding another backslash ? :)

ftyers avatar Mar 08 '21 21:03 ftyers

seems that backslashes are only interpreted as backslashes here... Any ideas other than removing all the forward slashes from the text I am trying to process?

eagad avatar Mar 08 '21 21:03 eagad

Probably what this indicates is that there should be a way to have analyse() invoke deformatters if there isn't already.

mr-martian avatar Mar 08 '21 22:03 mr-martian

Also, I think this should actually be on https://github.com/apertium/apertium-python but I for some reason am not able to transfer it there

mr-martian avatar Mar 08 '21 22:03 mr-martian

Dear colleagues, thank you for your work.

How do i fix this? Some workaround maybe.

Minimal example:

    ESC_PATTERN = re.compile("([/^$<>*{}\\\\@#+~])", re.UNICODE)
    analyzer = apertium.Analyzer("kir")
    text = "Кыргызстанда ВИЧ/СПИД менен күрөшүүгө акча жетишпейт."
    text = re.sub(ESC_PATTERN, r"\\\\\1", text.strip())
    print(text)
    analysis: List[LexicalUnit] = analyzer.analyze(text)
    print([lexical_unit.wordform for lexical_unit in analysis])

Output

Кыргызстанда ВИЧ\\/СПИД менен күрөшүүгө акча жетишпейт.
Error: malformed input stream: Found unexpected character / unescaped in stream
: iostream error
['Кыргызстанда', 'ВИЧ', '\\\\/\\\\<sent>']

Thanks in advance.

alexeyev avatar Jul 25 '23 11:07 alexeyev

My own workaround is the following

    SPECIAL_CHARACTERS = list("/^$<>*{}\\@#+~")
    REPLACEMENTS = ["shashchar", "capchar", "dollarchar", "lesschar", "morechar", "astchar",
                    "curlyleftchar", "curlyrightchar", "backslashchar", "atchar", "hashchar",
                    "pluschar", "tildechar"]

    assert len(SPECIAL_CHARACTERS) == len(REPLACEMENTS)

    spchar2code = {ch: co for ch, co in zip(SPECIAL_CHARACTERS, REPLACEMENTS)}
    code2spchar = {co: ch for ch, co in zip(SPECIAL_CHARACTERS, REPLACEMENTS)}

    analyzer = apertium.Analyzer("kir")
    text = "Кыргызстанда ВИЧ/СПИД менен күрөшүүгө акча жетишпейт."

    for spc in spchar2code:
        text = text.replace(spc, f" {spchar2code[spc]} ")

    print(text)
    analysis: List[LexicalUnit] = analyzer.analyze(text)
    tokens = [lu.wordform if lu.wordform not in code2spchar else code2spchar[lu.wordform] for lu in analysis]
    print(tokens)

but clearly that's not how the cool kids should do it.

alexeyev avatar Jul 25 '23 11:07 alexeyev

I would maybe just send it through apertium-destxt, though I don't know if apertium-python has some builtin way or you have to subprocess.communicate yourself

unhammer avatar Jul 25 '23 15:07 unhammer

Thank you, will give it a try!

alexeyev avatar Jul 28 '23 04:07 alexeyev