marian-dev
marian-dev copied to clipboard
XML Markup Scheme for Marian Decoder
I might be interested in taking another stab at a XML markup scheme with specified translations as in Moses:
The house is <x translation="klein"> small </a> .
This does require the decoding algorithm to have access to the attention states for each output word. This sounds rather tricky to me, since the decoder is model-agnostic. Any suggestions?
I guess starting in models/encdec.h, if you look at this line:
https://github.com/marian-nmt/marian-dev/blob/master/src/models/encdec.h#L355
you see how to get access to the attention weights of a decoder, currently that only works for RNN-based decoders. I haven't unified this yet. You would need to push it through to ScorerWrapper
in
https://github.com/marian-nmt/marian-dev/blob/master/src/translator/scorers.h#L59
as it holds the EncDec object during translation. Then you are already at the level of the translator.
Adding actual XML to our very single purpose input/output formats may actually be the more challenging part :)
How are you planning to make the actual substitution and integration? Only overwrite the target token, or do you want to force an embedding for "klein" and steer the decoding process?
@emjotde 1. I had tried the case "only overwrite the target token".
I chosed using attention matrix to do replacement.
( En->Fr, En : I played <unk>FOOTBALL</unk>,
Fr : j'ai joué au UNK. , After Replacement : j'ai joué au FOOTBALL
, Attention : 2-3)
But it won't be able to generate UNK or having a correct attention matrix after translation in sometimes. (complicated sentence may fail as well, no guarantee)
2. Forcing an embedding while decoding is a great idea but it may have trouble during OOV.
Reviving this, was this handled at some point? If not, I can add this functionality (overwriting the target token) in the PR here: https://github.com/ebay-hlt/marian-dev/pull/1
I started a branch (xml) in marian-nmt / marian-dev. I am currently working on it and hope to get somewhere in the next 1-2 weeks (implementing a grid search approach).
@phikoehn I have already implemented XML handling in the above PR (minute effort is needed for alternate translations). If it makes it any easy for you I can issue a PR to marian-dev and you can build on top of it.
Looking over the code, this introduces a word replacement feature, where the decoder is expected to produce special output tokens (starting with $) that are then replaced during output printing. This works fine for named entities (this seems to be the intention), but not for words who influence word choices around it. I am planning to modify the search algorithm (using "grid search") to also allow multi-word XML tags.
Yeah. In eBay we have placeholders starting with $ and the idea for now was just to replace those with the entities.
I am making good progress with the full XML implementation.
One code question. At around line 220 of src/translator/beam_search.h
,
I plan to create additional hypotheses that match the XML constraints.
For this I will need to access the cost for an arbitrary target word for any
hypothesis.
This score is somewhere in totalCosts
which is an Expr
and my
understanding is that its value may be just on the GPU (during GPU
decoding mode). How can I pull out a single float value from it in the
CPU-bound beam search management?
I completed the XML constraint decoding. The code currently checked into XML branch is fully functional, also with batched decoding but could use some stress testing (I am planning to do that) and is currently spewing out excessive amounts of debugging information. --- I would appreciate help with two things to make it ready to be merged into the main branch.
(1) The memory management for XML-related data structures does not use smart pointers and is currently leaking memory. Can someone who understands the Marian pointers take a look at that? I can give detailed guidance on where objects are allocated.
(2) The code is not merged with the latest version of Marian, partly because I have a problem with compiling the latest version of Marian with the ancient libraries that are on my systems.
I'm happy to help.
Ad 2) Can you provide a compilation log? I am not aware of changes we introduced apart from changes to Warnings. If that hurts you we should take a look at that.
Actually, maybe open a separate issue for that in marian-dev and we will solve that first?
I updated the branch xml
with the current master, and put it into the branch xml-merge as the merge wasn't straightforward. The code is still far from being ready for a pull request. With the xml
branch, I am getting segmentation faults for beam-size 1
and random illegal memory access errors, I think, from translator/nth_element.cu
. This happens even without --xml-input
. The code will also need to be cleaned from debug messages and refactorized.
@phikoehn Could you provide some usage examples or test cases with expected outputs? I am not sure how to use the constrained decoding. I would like to add a few regression tests and take a look at the errors.
To use this, the flag "--xml-input" must be added. Input with specified translations is presented as
das haus ist <a translation="big"> gross </a> .
where the name of the tag (a) does not matter, but "translation" has to be used to specify a single translation. The translation has to be in byte-pair-encoded format in the target language.
Yes, I tried something like that (following some Moses examples), but didn't see any difference. I was decoding with the Edinburgh WMT16 EN-DE model.
Did you specify "--xml-input" when calling the decoder?
Did you never see an effect or just not always?
You should try short examples first just to be sure that it is called correctly.
On Fri, Dec 7, 2018 at 4:37 PM Roman Grundkiewicz [email protected] wrote:
Yes, I tried something like that (following some Moses examples), but didn't see any difference. I was decoding with the Edinburgh WMT16 EN-DE model.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/marian-nmt/marian-dev/issues/399#issuecomment-445373093, or mute the thread https://github.com/notifications/unsubscribe-auth/AALpuMLhxCHLOt4sKiOJzYrOJfQ736F6ks5u2t9sgaJpZM4ZC8xl .
Regarding the memory access issues: check how XmlOption, XmlOptionCovered, XmlOptions, XmlOptionsList, and XmlOptionCoveredList objects are allocated. The best way would be to use smart pointers, but I could not pull that off.
I tried, for example:
echo "this is <n translation="schlecht"> bad </n>" \
| ../marian-dev/build-xml/marian-decoder -c marian.en-de.yml --xml-input
and got:
das ist schlimm
It seems xmlSearch is executed, but I don't see any difference between this and echo "this is bad"
w/o --xml-input
. The n-best list scores are the same. Logs attached.
[click to see logs]
[2018-12-07 22:04:10] [config] workspace: 512
[2018-12-07 22:04:10] [config] xml-alignment-weight: 1
[2018-12-07 22:04:10] [config] xml-input: true
[2018-12-07 22:04:10] [config] xml-violation-penalty: 10
...
process xml for this is bad
vocabs_.size() = 1
called processXml
CorpusBase::addWordsToSentenceTuple
list is 0x91d0f60
batch->setXmlOptionsList(xmlOptionsList);
setXmlOptionsList 0x91d0f60
xopsl->size() = 1, 0x91c99e0
xops->size() = 0
batch->setXmlOptionsList(xmlOptionsList); OK
[2018-12-07 22:04:13] [data] Loading vocabulary from Yaml/JSON file /mnt/zisa0/romang/marian/mrt/models/wmt16_systems/en-de/vocab.de.json
pulling xmlOptionsList 0x91d0f60
xmlOptions 0x91c99e0
Hypothesis xmlOptions 0x91c99e0
still alive and kicking 4
beam=0 i=0 xml status=0/0
beam=0 i=1 xml status=0/0
beam=0 i=2 xml status=0/0
starting beam sizes 3
beam 0: 0
subbeamCount = 1
ADDING ADDITIONAL HYPOTHESES
REGULAR BEAM EXPANSION
beam 0, subbeam 0: 111
nth->setHypMask
collectedKeys.size() = 1
collectedKeys[beamNo].size() = 1
collectedKeys[beamNo][subbeam].size() = 0
collectedCosts[0][0].size() = 0
build new XmlOptionCoveredList for beam 0 hyp 0 cost -0.0101005 das:12 ...collectedKeys.size() = 1
collectedKeys[beamNo].size() = 1
collectedKeys[beamNo][subbeam].size() = 1
collectedCosts[0][0].size() = 1
build new XmlOptionCoveredList for beam 0 hyp 0 cost -5.71356 dies:88 ...collectedKeys.size() = 1
collectedKeys[beamNo].size() = 1
collectedKeys[beamNo][subbeam].size() = 2
collectedCosts[0][0].size() = 2
build new XmlOptionCoveredList for beam 0 hyp 0 cost -7.04192 es:21 ...SUBBEAM 0 COST/KEY
beam 0 hyp 0>0 cost -0.0101005 das:12 ...[0]
beam 0 hyp 0>1 cost -5.71356 dies:88 ...[0]
beam 0 hyp 0>2 cost -7.04192 es:21 ...[0]
allotted: 3/3
toBeRedistributed=0 3/3redistributed: 3/3
merge beam 0 from subbeam 0, hyp 0: das:12,-0.0101005
merge beam 0 from subbeam 0, hyp 1: dies:88,-5.71356
merge beam 0 from subbeam 0, hyp 2: es:21,-7.04192
outCosts.size() = 3
outCosts.size() = 3, localBeamSize = 3
beam 0 hyp 0>0 cost -0.0101005 das ...[0]
beam 0 hyp 0>1 cost -5.71356 dies ...[0]
beam 0 hyp 0>2 cost -7.04192 es ...[0]
beam 0 hyp 0 cost -0.0101005 [0] das
beam 0 hyp 1 cost -5.71356 [0] dies
beam 0 hyp 2 cost -7.04192 [0] es
pruning the beam
histories[i]->Add(beams[i]
histories[i]->Add(beams[i] OK
remaining beam sizes 3
beam 0 hyp 0 cost -0.0101005 [0] das
beam 0 hyp 1 cost -5.71356 [0] dies
beam 0 hyp 2 cost -7.04192 [0] es
DONE WITH LOOP, localBeamSize now 3
beam=0 i=0 xml status=0/0
beam=0 i=1 xml status=0/0
beam=0 i=2 xml status=0/0
starting beam sizes 3
starting beam sizes 3
beam 0: 0
subbeamCount = 1
ADDING ADDITIONAL HYPOTHESES
REGULAR BEAM EXPANSION
beam 0, subbeam 0: 111
nth->setHypMask
collectedKeys.size() = 1
collectedKeys[beamNo].size() = 1
collectedKeys[beamNo][subbeam].size() = 0
collectedCosts[0][0].size() = 0
build new XmlOptionCoveredList for beam 0 hyp 0 cost -0.0170677 ist:13 ...collectedKeys.size() = 1
collectedKeys[beamNo].size() = 1
collectedKeys[beamNo][subbeam].size() = 1
collectedCosts[0][0].size() = 1
build new XmlOptionCoveredList for beam 0 hyp 1 cost -5.72473 ist:13 ...collectedKeys.size() = 1
collectedKeys[beamNo].size() = 1
collectedKeys[beamNo][subbeam].size() = 2
collectedCosts[0][0].size() = 2
build new XmlOptionCoveredList for beam 0 hyp 0 cost -6.66541 geht:171 ...SUBBEAM 0 COST/KEY
beam 0 hyp 0>0 cost -0.0170677 ist:13 ...[0] das
beam 0 hyp 1>1 cost -5.72473 ist:13 ...[0] dies
beam 0 hyp 0>2 cost -6.66541 geht:171 ...[0] das
allotted: 3/3
toBeRedistributed=0 3/3redistributed: 3/3
merge beam 0 from subbeam 0, hyp 0: ist:13,-0.0170677
merge beam 0 from subbeam 0, hyp 1: ist:13,-5.72473
merge beam 0 from subbeam 0, hyp 2: geht:171,-6.66541
outCosts.size() = 3
outCosts.size() = 3, localBeamSize = 3
beam 0 hyp 0>0 cost -0.0170677 ist ...[0] das
beam 0 hyp 1>1 cost -5.72473 ist ...[0] dies
beam 0 hyp 0>2 cost -6.66541 geht ...[0] das
beam 0 hyp 0 cost -0.0170677 [0] ist das
beam 0 hyp 1 cost -5.72473 [1] ist dies
beam 0 hyp 2 cost -6.66541 [0] geht das
pruning the beam
histories[i]->Add(beams[i]
histories[i]->Add(beams[i] OK
remaining beam sizes 3
beam 0 hyp 0 cost -0.0170677 [0] ist das
beam 0 hyp 1 cost -5.72473 [1] ist dies
beam 0 hyp 2 cost -6.66541 [0] geht das
DONE WITH LOOP, localBeamSize now 3
beam=0 i=0 xml status=0/0
beam=0 i=1 xml status=0/0
beam=0 i=2 xml status=0/0
starting beam sizes 3
starting beam sizes 3
beam 0: 0
subbeamCount = 1
ADDING ADDITIONAL HYPOTHESES
REGULAR BEAM EXPANSION
beam 0, subbeam 0: 111
nth->setHypMask
collectedKeys.size() = 1
collectedKeys[beamNo].size() = 1
collectedKeys[beamNo][subbeam].size() = 0
collectedCosts[0][0].size() = 0
build new XmlOptionCoveredList for beam 0 hyp 0 cost -0.715873 schlimm:16626 ...collectedKeys.size() = 1
collectedKeys[beamNo].size() = 1
collectedKeys[beamNo][subbeam].size() = 1
collectedCosts[0][0].size() = 1
build new XmlOptionCoveredList for beam 0 hyp 0 cost -1.32059 schlecht:2927 ...collectedKeys.size() = 1
collectedKeys[beamNo].size() = 1
collectedKeys[beamNo][subbeam].size() = 2
collectedCosts[0][0].size() = 2
build new XmlOptionCoveredList for beam 0 hyp 0 cost -2.13292 übel:34649 ...SUBBEAM 0 COST/KEY
beam 0 hyp 0>0 cost -0.715873 schlimm:16626 ...[0] ist das
beam 0 hyp 0>1 cost -1.32059 schlecht:2927 ...[0] ist das
beam 0 hyp 0>2 cost -2.13292 übel:34649 ...[0] ist das
allotted: 3/3
toBeRedistributed=0 3/3redistributed: 3/3
merge beam 0 from subbeam 0, hyp 0: schlimm:16626,-0.715873
merge beam 0 from subbeam 0, hyp 1: schlecht:2927,-1.32059
merge beam 0 from subbeam 0, hyp 2: übel:34649,-2.13292
outCosts.size() = 3
outCosts.size() = 3, localBeamSize = 3
beam 0 hyp 0>0 cost -0.715873 schlimm ...[0] ist das
beam 0 hyp 0>1 cost -1.32059 schlecht ...[0] ist das
beam 0 hyp 0>2 cost -2.13292 übel ...[0] ist das
beam 0 hyp 0 cost -0.715873 [0] schlimm ist das
beam 0 hyp 1 cost -1.32059 [0] schlecht ist das
beam 0 hyp 2 cost -2.13292 [0] übel ist das
pruning the beam
histories[i]->Add(beams[i]
histories[i]->Add(beams[i] OK
remaining beam sizes 3
beam 0 hyp 0 cost -0.715873 [0] schlimm ist das
beam 0 hyp 1 cost -1.32059 [0] schlecht ist das
beam 0 hyp 2 cost -2.13292 [0] übel ist das
DONE WITH LOOP, localBeamSize now 3
beam=0 i=0 xml status=0/0
beam=0 i=1 xml status=0/0
beam=0 i=2 xml status=0/0
starting beam sizes 3
starting beam sizes 3
beam 0: 0
subbeamCount = 1
ADDING ADDITIONAL HYPOTHESES
REGULAR BEAM EXPANSION
beam 0, subbeam 0: 111
nth->setHypMask
collectedKeys.size() = 1
collectedKeys[beamNo].size() = 1
collectedKeys[beamNo][subbeam].size() = 0
collectedCosts[0][0].size() = 0
build new XmlOptionCoveredList for beam 0 hyp 0 cost -0.849237 :0 ...collectedKeys.size() = 1
collectedKeys[beamNo].size() = 1
collectedKeys[beamNo][subbeam].size() = 1
collectedCosts[0][0].size() = 1
build new XmlOptionCoveredList for beam 0 hyp 1 cost -1.38329 :0 ...collectedKeys.size() = 1
collectedKeys[beamNo].size() = 1
collectedKeys[beamNo][subbeam].size() = 2
collectedCosts[0][0].size() = 2
build new XmlOptionCoveredList for beam 0 hyp 2 cost -2.27102 :0 ...SUBBEAM 0 COST/KEY
beam 0 hyp 0>0 cost -0.849237 :0 ...[0] schlimm ist das
beam 0 hyp 1>1 cost -1.38329 :0 ...[0] schlecht ist das
beam 0 hyp 2>2 cost -2.27102 :0 ...[0] übel ist das
allotted: 3/3
toBeRedistributed=0 3/3redistributed: 3/3
merge beam 0 from subbeam 0, hyp 0: :0,-0.849237
merge beam 0 from subbeam 0, hyp 1: :0,-1.38329
merge beam 0 from subbeam 0, hyp 2: :0,-2.27102
outCosts.size() = 3
outCosts.size() = 3, localBeamSize = 3
beam 0 hyp 0>0 cost -0.849237 ...[0] schlimm ist das
beam 0 hyp 1>1 cost -1.38329 ...[0] schlecht ist das
beam 0 hyp 2>2 cost -2.27102 ...[0] übel ist das
beam 0 hyp 0 cost -0.849237 [0]
beam 0 hyp 1 cost -1.38329 [1]
beam 0 hyp 2 cost -2.27102 [2]
pruning the beam
histories[i]->Add(beams[i]
Add 4 0 -0.849237
Add 4 1 -1.38329
Add 4 2 -2.27102
histories[i]->Add(beams[i] OK
remaining beam sizes 0
DONE WITH LOOP, localBeamSize now 0
[2018-12-07 22:04:15] Best translation 0 : das ist schlimm
[2018-12-07 22:04:15] Total time: 1.234949s wall, 1.100000s user + 0.130000s system = 1.230000s CPU (99.6%)
I uploaded a model that I have been using for testing to http://www.statmt.org/system.tgz.
This also included 10 test sentences, the first on is:
<a translation="Trump">Obama</a> empfängt Net@@ any@@ ahu
Which gives the translation:
Trump welcomes Net@@ any@@ ahu
The model would not mistake these two presidents otherwise.
@phikoehn Could you upload vocabs again? The archive contains only symlinks.
I used your command (from translate-ten-xml.sh) and input file (preprocessed with different BPE codes) with http://data.statmt.org/wmt16_systems/de-en/, and the first sentence is translated into:
Obama welcomes Netanyahu
I uploaded these to http://www.statmt.org/train.bpe.de.json and http://www.statmt.org/train.bpe.en.json
Thanks! I get the proper translation now. It works with other models too, but I needed to set higher --xml-violation-score
(fixed in the branch "xml-merge") and, more importantly, make sure that words from XML tags occur in the target vocabulary.
Okay, great!
I need to do some experimentation with the xml-violation-score setting, and generally (in a similar fashion to related work). I will also clean up the extravagant debug logging after that.
What I need help with is (a) merge with the latest master and (b) memory allocation for the XML options.
I've already merged the code with the current master in the branch "xml-merge". I've also fixed the decoding without --xml-input
(there was an illegal memory access error). I think there is more to do before creating a pull request and putting this into master:
- Adding regression tests
- Removing the debug logging
- General refactorization: including memory allocation, dropping multiple vocabulary loadings, cleaning beam search code, etc.
- CPU implementation (masking in n-th element)
- Checking why beam size 1 gives bad translations
I'll add a bunch of regression tests tomorrow and I'm also happy with helping with other things, so I will start some refactorization and cleaning in spare time.
Hi all
I would like to ask you if you had the time to work on this feature. I tested on my side and it works fine with my models. However, I get OoM errors in some cases.
Thank you
Have you tried the "xml-merge" branch? I've fixed some more things, added a bunch of regression tests, and removed bare pointers. The code still needs refactorization, and a higher memory usage can be caused by extensive debugging logs.
Yes, I tried this one, about a week ago or so. I will update the code later today or tomorrow and will give it a try.
I removed all the debugging messages. It is now more than twice as fast.
I will rebase with the current master and do more refactoring. If I remember correctly, there were some code added in other classes just to make the debugging possible. I've also left more TODOs for myself when I was doing the first merge with master.