apertium-apy icon indicating copy to clipboard operation
apertium-apy copied to clipboard

Chained translation accumulates unknown word marks

Open sushain97 opened this issue 8 years ago • 10 comments

e.g.

meow (en->es) *meow *meow (es->fr) **meow

so

meow (en->fr) **meow

instead of

meow (en->fr) *meow

sushain97 avatar Jan 09 '17 05:01 sushain97

@unhammer ideas (aside from manually removing the marks)?

sushain97 avatar Jan 09 '17 05:01 sushain97

I think we'll just have to regex them away (or, into one), like the "remove error marks" thing already does.

(Ideally, lt-proc could switch on-the-fly between marking and non-marking using some stream-signal. I'd rather not start separate pipelines for with and without marks; that sounds like more complex if-then's and memory usage.)

unhammer avatar Jan 09 '17 09:01 unhammer

@shardulc, could you take care of this? The regex for error marks is already in the APy code iirc.

sushain97 avatar Jan 09 '17 15:01 sushain97

44783f93 fixes this, and is only a three-line change after #43 is merged. Not opening a PR right now because all previous commits for chained translation show up too.

shardulc avatar Jan 16 '17 19:01 shardulc

@shardulc there are other unknown word marks other than *, such as #. There should be a regex floating around somewhere in APy or html-tools that is more comprehensive.

sushain97 avatar Jan 17 '17 00:01 sushain97

@sushain97 I took the one in that commit directly from here, which only has the asterisks. Is a different regex used anywhere?

shardulc avatar Jan 17 '17 00:01 shardulc

Hm... perhaps not.

https://github.com/goavki/streamparser/blob/master/streamparser.py#L28-L38

sushain97 avatar Jan 17 '17 00:01 sushain97

In released pairs, we shouldn't have # (if the language data was completely testvoc'd), so I don't think we should worry about those.

unhammer avatar Jan 17 '17 08:01 unhammer

Can i work on the issue??

SAP-20 avatar Jan 15 '21 15:01 SAP-20

@SAP-20, you don't have to ask for that. If you want to fix it, just fix it and submit a PR.

TinoDidriksen avatar Jan 15 '21 15:01 TinoDidriksen