marian-dev XML Markup Scheme for Marian Decoder

I might be interested in taking another stab at a XML markup scheme with specified translations as in Moses:

The house is <x translation="klein"> small </a> .

This does require the decoding algorithm to have access to the attention states for each output word. This sounds rather tricky to me, since the decoder is model-agnostic. Any suggestions?

Jan 23 '18 10:01 phikoehn

I guess starting in models/encdec.h, if you look at this line:

https://github.com/marian-nmt/marian-dev/blob/master/src/models/encdec.h#L355

you see how to get access to the attention weights of a decoder, currently that only works for RNN-based decoders. I haven't unified this yet. You would need to push it through to ScorerWrapper in

https://github.com/marian-nmt/marian-dev/blob/master/src/translator/scorers.h#L59

as it holds the EncDec object during translation. Then you are already at the level of the translator.

Jan 24 '18 15:01 emjotde

Adding actual XML to our very single purpose input/output formats may actually be the more challenging part :)

How are you planning to make the actual substitution and integration? Only overwrite the target token, or do you want to force an embedding for "klein" and steer the decoding process?

Jan 24 '18 15:01 emjotde

@emjotde 1. I had tried the case "only overwrite the target token". I chosed using attention matrix to do replacement. ( En->Fr, En : I played <unk>FOOTBALL</unk>, Fr : j'ai joué au UNK. , After Replacement : j'ai joué au FOOTBALL, Attention : 2-3) But it won't be able to generate UNK or having a correct attention matrix after translation in sometimes. (complicated sentence may fail as well, no guarantee) 2. Forcing an embedding while decoding is a great idea but it may have trouble during OOV.

Feb 26 '18 07:02 lkfo415579

Reviving this, was this handled at some point? If not, I can add this functionality (overwriting the target token) in the PR here: https://github.com/ebay-hlt/marian-dev/pull/1

Jun 11 '18 16:06 mtresearcher

I started a branch (xml) in marian-nmt / marian-dev. I am currently working on it and hope to get somewhere in the next 1-2 weeks (implementing a grid search approach).

Jun 11 '18 18:06 phikoehn

@phikoehn I have already implemented XML handling in the above PR (minute effort is needed for alternate translations). If it makes it any easy for you I can issue a PR to marian-dev and you can build on top of it.

Jun 11 '18 20:06 mtresearcher

Looking over the code, this introduces a word replacement feature, where the decoder is expected to produce special output tokens (starting with $) that are then replaced during output printing. This works fine for named entities (this seems to be the intention), but not for words who influence word choices around it. I am planning to modify the search algorithm (using "grid search") to also allow multi-word XML tags.

Jun 15 '18 21:06 phikoehn

Yeah. In eBay we have placeholders starting with $ and the idea for now was just to replace those with the entities.

Jun 15 '18 21:06 mtresearcher

I am making good progress with the full XML implementation.

One code question. At around line 220 of src/translator/beam_search.h, I plan to create additional hypotheses that match the XML constraints. For this I will need to access the cost for an arbitrary target word for any hypothesis.

This score is somewhere in totalCosts which is an Expr and my understanding is that its value may be just on the GPU (during GPU decoding mode). How can I pull out a single float value from it in the CPU-bound beam search management?

Jun 21 '18 22:06 phikoehn

I completed the XML constraint decoding. The code currently checked into XML branch is fully functional, also with batched decoding but could use some stress testing (I am planning to do that) and is currently spewing out excessive amounts of debugging information. --- I would appreciate help with two things to make it ready to be merged into the main branch.

(1) The memory management for XML-related data structures does not use smart pointers and is currently leaking memory. Can someone who understands the Marian pointers take a look at that? I can give detailed guidance on where objects are allocated.

(2) The code is not merged with the latest version of Marian, partly because I have a problem with compiling the latest version of Marian with the ancient libraries that are on my systems.

Dec 05 '18 14:12 phikoehn

I'm happy to help.

Dec 05 '18 14:12 snukky

Ad 2) Can you provide a compilation log? I am not aware of changes we introduced apart from changes to Warnings. If that hurts you we should take a look at that.

Dec 05 '18 15:12 emjotde

Actually, maybe open a separate issue for that in marian-dev and we will solve that first?

Dec 05 '18 15:12 emjotde

I updated the branch xml with the current master, and put it into the branch xml-merge as the merge wasn't straightforward. The code is still far from being ready for a pull request. With the xml branch, I am getting segmentation faults for beam-size 1 and random illegal memory access errors, I think, from translator/nth_element.cu. This happens even without --xml-input. The code will also need to be cleaned from debug messages and refactorized.

@phikoehn Could you provide some usage examples or test cases with expected outputs? I am not sure how to use the constrained decoding. I would like to add a few regression tests and take a look at the errors.

Dec 06 '18 14:12 snukky

To use this, the flag "--xml-input" must be added. Input with specified translations is presented as

das haus ist <a translation="big"> gross </a> .

where the name of the tag (a) does not matter, but "translation" has to be used to specify a single translation. The translation has to be in byte-pair-encoded format in the target language.

Dec 07 '18 21:12 phikoehn

Yes, I tried something like that (following some Moses examples), but didn't see any difference. I was decoding with the Edinburgh WMT16 EN-DE model.

Dec 07 '18 21:12 snukky

Did you specify "--xml-input" when calling the decoder?

Did you never see an effect or just not always?

You should try short examples first just to be sure that it is called correctly.

On Fri, Dec 7, 2018 at 4:37 PM Roman Grundkiewicz [email protected] wrote:

Yes, I tried something like that (following some Moses examples), but didn't see any difference. I was decoding with the Edinburgh WMT16 EN-DE model.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/marian-nmt/marian-dev/issues/399#issuecomment-445373093, or mute the thread https://github.com/notifications/unsubscribe-auth/AALpuMLhxCHLOt4sKiOJzYrOJfQ736F6ks5u2t9sgaJpZM4ZC8xl .

Dec 07 '18 21:12 phikoehn

Regarding the memory access issues: check how XmlOption, XmlOptionCovered, XmlOptions, XmlOptionsList, and XmlOptionCoveredList objects are allocated. The best way would be to use smart pointers, but I could not pull that off.

Dec 07 '18 21:12 phikoehn

I tried, for example:

echo "this is <n translation="schlecht"> bad </n>" \
| ../marian-dev/build-xml/marian-decoder -c marian.en-de.yml --xml-input

and got:

das ist schlimm

It seems xmlSearch is executed, but I don't see any difference between this and echo "this is bad" w/o --xml-input. The n-best list scores are the same. Logs attached.

[click to see logs]


[2018-12-07 22:04:10] [config] workspace: 512
[2018-12-07 22:04:10] [config] xml-alignment-weight: 1
[2018-12-07 22:04:10] [config] xml-input: true
[2018-12-07 22:04:10] [config] xml-violation-penalty: 10
...
process xml for this is  bad 
vocabs_.size() = 1
called processXml
CorpusBase::addWordsToSentenceTuple
list is 0x91d0f60
batch->setXmlOptionsList(xmlOptionsList);
setXmlOptionsList 0x91d0f60
xopsl->size() = 1, 0x91c99e0
xops->size() = 0
batch->setXmlOptionsList(xmlOptionsList); OK
[2018-12-07 22:04:13] [data] Loading vocabulary from Yaml/JSON file /mnt/zisa0/romang/marian/mrt/models/wmt16_systems/en-de/vocab.de.json
pulling xmlOptionsList 0x91d0f60
xmlOptions 0x91c99e0
Hypothesis xmlOptions 0x91c99e0
still alive and kicking 4
beam=0 i=0 xml status=0/0
beam=0 i=1 xml status=0/0
beam=0 i=2 xml status=0/0
starting beam sizes 3
beam 0: 0
subbeamCount = 1
ADDING ADDITIONAL HYPOTHESES
REGULAR BEAM EXPANSION
beam 0, subbeam 0: 111
nth->setHypMask
collectedKeys.size() = 1
collectedKeys[beamNo].size() = 1
collectedKeys[beamNo][subbeam].size() = 0
collectedCosts[0][0].size() = 0
build new XmlOptionCoveredList for beam 0 hyp 0 cost -0.0101005  das:12 ...collectedKeys.size() = 1
collectedKeys[beamNo].size() = 1
collectedKeys[beamNo][subbeam].size() = 1
collectedCosts[0][0].size() = 1
build new XmlOptionCoveredList for beam 0 hyp 0 cost -5.71356    dies:88 ...collectedKeys.size() = 1
collectedKeys[beamNo].size() = 1
collectedKeys[beamNo][subbeam].size() = 2
collectedCosts[0][0].size() = 2
build new XmlOptionCoveredList for beam 0 hyp 0 cost -7.04192    es:21 ...SUBBEAM 0 COST/KEY
beam 0 hyp 0>0  cost -0.0101005  das:12 ...[0]
beam 0 hyp 0>1  cost -5.71356    dies:88 ...[0]
beam 0 hyp 0>2  cost -7.04192    es:21 ...[0]
allotted: 3/3
 toBeRedistributed=0 3/3redistributed: 3/3
merge beam 0 from subbeam 0, hyp 0: das:12,-0.0101005
merge beam 0 from subbeam 0, hyp 1: dies:88,-5.71356
merge beam 0 from subbeam 0, hyp 2: es:21,-7.04192
outCosts.size() = 3
outCosts.size() = 3, localBeamSize = 3
beam 0 hyp 0>0  cost -0.0101005  das ...[0]
beam 0 hyp 0>1  cost -5.71356    dies ...[0]
beam 0 hyp 0>2  cost -7.04192    es ...[0]
beam 0 hyp 0    cost -0.0101005 [0]  das
beam 0 hyp 1    cost -5.71356   [0]  dies
beam 0 hyp 2    cost -7.04192   [0]  es
pruning the beam
histories[i]->Add(beams[i]
histories[i]->Add(beams[i] OK
remaining beam sizes 3
beam 0 hyp 0    cost -0.0101005 [0]  das
beam 0 hyp 1    cost -5.71356   [0]  dies
beam 0 hyp 2    cost -7.04192   [0]  es
DONE WITH LOOP, localBeamSize now 3
beam=0 i=0 xml status=0/0
beam=0 i=1 xml status=0/0
beam=0 i=2 xml status=0/0
starting beam sizes 3
starting beam sizes 3
beam 0: 0
subbeamCount = 1
ADDING ADDITIONAL HYPOTHESES
REGULAR BEAM EXPANSION
beam 0, subbeam 0: 111
nth->setHypMask
collectedKeys.size() = 1
collectedKeys[beamNo].size() = 1
collectedKeys[beamNo][subbeam].size() = 0
collectedCosts[0][0].size() = 0
build new XmlOptionCoveredList for beam 0 hyp 0 cost -0.0170677  ist:13 ...collectedKeys.size() = 1
collectedKeys[beamNo].size() = 1
collectedKeys[beamNo][subbeam].size() = 1
collectedCosts[0][0].size() = 1
build new XmlOptionCoveredList for beam 0 hyp 1 cost -5.72473    ist:13 ...collectedKeys.size() = 1
collectedKeys[beamNo].size() = 1
collectedKeys[beamNo][subbeam].size() = 2
collectedCosts[0][0].size() = 2
build new XmlOptionCoveredList for beam 0 hyp 0 cost -6.66541    geht:171 ...SUBBEAM 0 COST/KEY
beam 0 hyp 0>0  cost -0.0170677  ist:13 ...[0]  das
beam 0 hyp 1>1  cost -5.72473    ist:13 ...[0]  dies
beam 0 hyp 0>2  cost -6.66541    geht:171 ...[0]  das
allotted: 3/3
 toBeRedistributed=0 3/3redistributed: 3/3
merge beam 0 from subbeam 0, hyp 0: ist:13,-0.0170677
merge beam 0 from subbeam 0, hyp 1: ist:13,-5.72473
merge beam 0 from subbeam 0, hyp 2: geht:171,-6.66541
outCosts.size() = 3
outCosts.size() = 3, localBeamSize = 3
beam 0 hyp 0>0  cost -0.0170677  ist ...[0]  das
beam 0 hyp 1>1  cost -5.72473    ist ...[0]  dies
beam 0 hyp 0>2  cost -6.66541    geht ...[0]  das
beam 0 hyp 0    cost -0.0170677 [0]  ist das
beam 0 hyp 1    cost -5.72473   [1]  ist dies
beam 0 hyp 2    cost -6.66541   [0]  geht das
pruning the beam
histories[i]->Add(beams[i]
histories[i]->Add(beams[i] OK
remaining beam sizes 3
beam 0 hyp 0    cost -0.0170677 [0]  ist das
beam 0 hyp 1    cost -5.72473   [1]  ist dies
beam 0 hyp 2    cost -6.66541   [0]  geht das
DONE WITH LOOP, localBeamSize now 3
beam=0 i=0 xml status=0/0
beam=0 i=1 xml status=0/0
beam=0 i=2 xml status=0/0
starting beam sizes 3
starting beam sizes 3
beam 0: 0
subbeamCount = 1
ADDING ADDITIONAL HYPOTHESES
REGULAR BEAM EXPANSION
beam 0, subbeam 0: 111
nth->setHypMask
collectedKeys.size() = 1
collectedKeys[beamNo].size() = 1
collectedKeys[beamNo][subbeam].size() = 0
collectedCosts[0][0].size() = 0
build new XmlOptionCoveredList for beam 0 hyp 0 cost -0.715873   schlimm:16626 ...collectedKeys.size() = 1
collectedKeys[beamNo].size() = 1
collectedKeys[beamNo][subbeam].size() = 1
collectedCosts[0][0].size() = 1
build new XmlOptionCoveredList for beam 0 hyp 0 cost -1.32059    schlecht:2927 ...collectedKeys.size() = 1
collectedKeys[beamNo].size() = 1
collectedKeys[beamNo][subbeam].size() = 2
collectedCosts[0][0].size() = 2
build new XmlOptionCoveredList for beam 0 hyp 0 cost -2.13292    übel:34649 ...SUBBEAM 0 COST/KEY
beam 0 hyp 0>0  cost -0.715873   schlimm:16626 ...[0]  ist das
beam 0 hyp 0>1  cost -1.32059    schlecht:2927 ...[0]  ist das
beam 0 hyp 0>2  cost -2.13292    übel:34649 ...[0]  ist das
allotted: 3/3
 toBeRedistributed=0 3/3redistributed: 3/3
merge beam 0 from subbeam 0, hyp 0: schlimm:16626,-0.715873
merge beam 0 from subbeam 0, hyp 1: schlecht:2927,-1.32059
merge beam 0 from subbeam 0, hyp 2: übel:34649,-2.13292
outCosts.size() = 3
outCosts.size() = 3, localBeamSize = 3
beam 0 hyp 0>0  cost -0.715873   schlimm ...[0]  ist das
beam 0 hyp 0>1  cost -1.32059    schlecht ...[0]  ist das
beam 0 hyp 0>2  cost -2.13292    übel ...[0]  ist das
beam 0 hyp 0    cost -0.715873  [0]  schlimm ist das
beam 0 hyp 1    cost -1.32059   [0]  schlecht ist das
beam 0 hyp 2    cost -2.13292   [0]  übel ist das
pruning the beam
histories[i]->Add(beams[i]
histories[i]->Add(beams[i] OK
remaining beam sizes 3
beam 0 hyp 0    cost -0.715873  [0]  schlimm ist das
beam 0 hyp 1    cost -1.32059   [0]  schlecht ist das
beam 0 hyp 2    cost -2.13292   [0]  übel ist das
DONE WITH LOOP, localBeamSize now 3
beam=0 i=0 xml status=0/0
beam=0 i=1 xml status=0/0
beam=0 i=2 xml status=0/0
starting beam sizes 3
starting beam sizes 3
beam 0: 0
subbeamCount = 1
ADDING ADDITIONAL HYPOTHESES
REGULAR BEAM EXPANSION
beam 0, subbeam 0: 111
nth->setHypMask
collectedKeys.size() = 1
collectedKeys[beamNo].size() = 1
collectedKeys[beamNo][subbeam].size() = 0
collectedCosts[0][0].size() = 0
build new XmlOptionCoveredList for beam 0 hyp 0 cost -0.849237   :0 ...collectedKeys.size() = 1
collectedKeys[beamNo].size() = 1
collectedKeys[beamNo][subbeam].size() = 1
collectedCosts[0][0].size() = 1
build new XmlOptionCoveredList for beam 0 hyp 1 cost -1.38329    :0 ...collectedKeys.size() = 1
collectedKeys[beamNo].size() = 1
collectedKeys[beamNo][subbeam].size() = 2
collectedCosts[0][0].size() = 2
build new XmlOptionCoveredList for beam 0 hyp 2 cost -2.27102    :0 ...SUBBEAM 0 COST/KEY
beam 0 hyp 0>0  cost -0.849237   :0 ...[0]  schlimm ist das
beam 0 hyp 1>1  cost -1.38329    :0 ...[0]  schlecht ist das
beam 0 hyp 2>2  cost -2.27102    :0 ...[0]  übel ist das
allotted: 3/3
 toBeRedistributed=0 3/3redistributed: 3/3
merge beam 0 from subbeam 0, hyp 0: :0,-0.849237
merge beam 0 from subbeam 0, hyp 1: :0,-1.38329
merge beam 0 from subbeam 0, hyp 2: :0,-2.27102
outCosts.size() = 3
outCosts.size() = 3, localBeamSize = 3
beam 0 hyp 0>0  cost -0.849237    ...[0]  schlimm ist das
beam 0 hyp 1>1  cost -1.38329     ...[0]  schlecht ist das
beam 0 hyp 2>2  cost -2.27102     ...[0]  übel ist das
beam 0 hyp 0    cost -0.849237  [0]
beam 0 hyp 1    cost -1.38329   [1]
beam 0 hyp 2    cost -2.27102   [2]
pruning the beam
histories[i]->Add(beams[i]
Add 4 0 -0.849237
Add 4 1 -1.38329
Add 4 2 -2.27102
histories[i]->Add(beams[i] OK
remaining beam sizes 0
DONE WITH LOOP, localBeamSize now 0
[2018-12-07 22:04:15] Best translation 0 : das ist schlimm
[2018-12-07 22:04:15] Total time:  1.234949s wall, 1.100000s user + 0.130000s system = 1.230000s CPU (99.6%)

Dec 07 '18 22:12 snukky

I uploaded a model that I have been using for testing to http://www.statmt.org/system.tgz.

This also included 10 test sentences, the first on is: <a translation="Trump">Obama</a> empfängt Net@@ any@@ ahu

Which gives the translation: Trump welcomes Net@@ any@@ ahu

The model would not mistake these two presidents otherwise.

Dec 14 '18 05:12 phikoehn

@phikoehn Could you upload vocabs again? The archive contains only symlinks.

I used your command (from translate-ten-xml.sh) and input file (preprocessed with different BPE codes) with http://data.statmt.org/wmt16_systems/de-en/, and the first sentence is translated into:

Obama welcomes Netanyahu

Jan 03 '19 11:01 snukky

I uploaded these to http://www.statmt.org/train.bpe.de.json and http://www.statmt.org/train.bpe.en.json

Jan 03 '19 12:01 phikoehn

Thanks! I get the proper translation now. It works with other models too, but I needed to set higher --xml-violation-score (fixed in the branch "xml-merge") and, more importantly, make sure that words from XML tags occur in the target vocabulary.

Jan 03 '19 16:01 snukky

Okay, great!

I need to do some experimentation with the xml-violation-score setting, and generally (in a similar fashion to related work). I will also clean up the extravagant debug logging after that.

What I need help with is (a) merge with the latest master and (b) memory allocation for the XML options.

Jan 03 '19 21:01 phikoehn

I've already merged the code with the current master in the branch "xml-merge". I've also fixed the decoding without --xml-input (there was an illegal memory access error). I think there is more to do before creating a pull request and putting this into master:

Adding regression tests
Removing the debug logging
General refactorization: including memory allocation, dropping multiple vocabulary loadings, cleaning beam search code, etc.
CPU implementation (masking in n-th element)
Checking why beam size 1 gives bad translations

I'll add a bunch of regression tests tomorrow and I'm also happy with helping with other things, so I will start some refactorization and cleaning in spare time.

Jan 03 '19 21:01 snukky

Hi all

I would like to ask you if you had the time to work on this feature. I tested on my side and it works fine with my models. However, I get OoM errors in some cases.

Thank you

Jan 21 '19 13:01 dinosaxon

Have you tried the "xml-merge" branch? I've fixed some more things, added a bunch of regression tests, and removed bare pointers. The code still needs refactorization, and a higher memory usage can be caused by extensive debugging logs.

Jan 23 '19 13:01 snukky

Yes, I tried this one, about a week ago or so. I will update the code later today or tomorrow and will give it a try.

Jan 24 '19 09:01 dinosaxon

I removed all the debugging messages. It is now more than twice as fast.

Mar 19 '19 09:03 phikoehn

I will rebase with the current master and do more refactoring. If I remember correctly, there were some code added in other classes just to make the debugging possible. I've also left more TODOs for myself when I was doing the first merge with master.

Mar 19 '19 11:03 snukky

marian-dev marian-dev copied to clipboard

XML Markup Scheme for Marian Decoder

marian-dev
marian-dev copied to clipboard