bergamot-translator icon indicating copy to clipboard operation
bergamot-translator copied to clipboard

Relax continuity constraints on Annotation

Open jerinphilip opened this issue 2 years ago • 7 comments

Related: https://github.com/browsermt/bergamot-translator/issues/355#issuecomment-1043094869, https://github.com/browsermt/bergamot-translator/issues/298

I have proposed https://github.com/jelmervdl/firefox-translations/issues/5 at the experimental extension, a next feature in wishlist would be an explanation like the one below. A little far-fetched, but someday I'd like to see the visualization usually depicting attention as an explanation of translation via the extension.

image

(Screenshot taken from https://distill.pub/2016/augmented-rnns/, so we already have JS available under a permissive license, hopefully).

#298 indicates that we are editing annotation to get HTML in, but the subword tokens now include tag information. This is not ideal when we want to build things like the above. A solution is to relax the continuity constraints imposed to connect strongly to SentencePiece to just a constraint of monotonous byte ranges.

We may look at planting methods on Annotation to insert markup in between rather than doing it externally, keeping the whole data structure consistent. This would also make it simple for other markups when we get to building those.

Opening this issue to discuss.

jerinphilip avatar Feb 26 '22 11:02 jerinphilip

"Attention is not not Explanation" https://aclanthology.org/D19-1002.pdf "Attention is not Explanation" https://aclanthology.org/N19-1357.pdf

kpu avatar Feb 26 '22 15:02 kpu

One difficulty I noticed is that HTML is not just text with tags added in between. Some characters, like & and < need to be replaced with &amp; and &lt;.

jelmervdl avatar Feb 26 '22 15:02 jelmervdl

Good enough for HTML replacement, good enough for the visualization. Attention is all we need 🤗. Besides, we can build UI etc with the existing ByteRange derived Annotation and replace attention with whatever future mechanism becomes "explanation" in a similar setting.

Some characters, like & and < need to be replaced with &amp; and &lt;.

We don't need to relax the continuity constraints for this, but such op support via the Annotation class itself could be useful for a wider range of applications. May I ask the points where these edits happen to help study abstracting ops on Annotation that HTML is currently doing that can be pushed down and be reused across other markups as well?

No hurries though, we can slowly incubate this idea.

jerinphilip avatar Feb 27 '22 10:02 jerinphilip

The alignments come from guided alignment trained from fastalign. Not from attention. The alignments are what drives HTML alignment.

kpu avatar Feb 27 '22 19:02 kpu

image

NB: Continuity constraints are not relaxed, I just got the screenshot thing shown in first comment working. Looks pretty. There are some resizing ugliness.

jerinphilip avatar Mar 22 '22 10:03 jerinphilip

image

NB: Continuity constraints are not relaxed, I just got the screenshot thing shown in first comment working. Looks pretty. There are some resizing ugliness.

Will you share the code of your improved version?

lagleki avatar Jun 02 '22 19:06 lagleki

Will you share the code of your improved version?

https://github.com/jerinphilip/bergamot-translator/pull/88 (This is early experimental code, will take a while to merge to main).

jerinphilip avatar Jun 03 '22 08:06 jerinphilip