marian-dev
marian-dev copied to clipboard
No quantization of parameter with name ending in _Wt
Description
marian-conv does not work properly with models trained with tied-embeddings
and tied-embeddings-all
both set to false.
This PR removes quantization of parameters with name ending in _Wt, which is used for the vocabs when tied-embeddings
and tied-embeddings-all
are both set to false (see the logic around tiedParam_
in mlp::Output::lazyConstruct()
in src/layers/generic.cpp
, and for transformer models, see DecoderTransformer::lazyCreateOutputLayer()
in src/models/transformer.h
): if either tied-embeddings
and tied-embeddings-all
are set to true then the parameter name for the vocab is Wemb
or ends in _Wemb
. These parameters are not quantized.
This PR fixes a bug, issue: #683
List of changes: Parameters with names ending in _Wt added to logic of those to not quantize.
Added dependencies: none
How to test
marian-conv -f model.npz -t model.bin -g packed8avx512
echo 'test' | marian-decoder -b <beam-size> --cpu-threads 1 -m model.bin -v vocab.src.spm vocab.trg.spm
The error message is Error: Actual pathScore (-inf) is lower than INVALID_PATH_SCORE (-3.40282e+38)??
when the beam size is 2 or 3, and is Error: No hypotheses in n-best list??
when the beam size is 1. With this PR, normal translation occurs.
Also compare decode results with and without PR of models generated from marian-conv -f model.npz -t model.bin -g packed8avx2
and marian-conv -f model.npz -t model.bin -g packed16
Describe how you have tested your code, including OS and the cmake command.
Linux
cmake -DUSE_SENTENCEPIECE:BOOL=ON -DCOMPILE_CPU:BOOL=ON -DUSE_FBGEMM:BOOL=ON ..
Checklist
- [x] I have tested the code manually
- [x] I have read and followed CONTRIBUTING.md
@emjotde @snukky This fix would resolve the issue. But, in general, this converter is a temporary solution as it depends on the string patterns in the weight names. We may want to have an input file to list the weights? In any case, I will not block this PR itself.