marian-dev icon indicating copy to clipboard operation
marian-dev copied to clipboard

No quantization of parameter with name ending in _Wt

Open robberlang opened this issue 4 years ago • 1 comments

Description

marian-conv does not work properly with models trained with tied-embeddings and tied-embeddings-all both set to false. This PR removes quantization of parameters with name ending in _Wt, which is used for the vocabs when tied-embeddings and tied-embeddings-all are both set to false (see the logic around tiedParam_ in mlp::Output::lazyConstruct() in src/layers/generic.cpp, and for transformer models, see DecoderTransformer::lazyCreateOutputLayer() in src/models/transformer.h): if either tied-embeddings and tied-embeddings-all are set to true then the parameter name for the vocab is Wemb or ends in _Wemb. These parameters are not quantized.

This PR fixes a bug, issue: #683

List of changes: Parameters with names ending in _Wt added to logic of those to not quantize.

Added dependencies: none

How to test

marian-conv -f model.npz -t model.bin -g packed8avx512 echo 'test' | marian-decoder -b <beam-size> --cpu-threads 1 -m model.bin -v vocab.src.spm vocab.trg.spm The error message is Error: Actual pathScore (-inf) is lower than INVALID_PATH_SCORE (-3.40282e+38)?? when the beam size is 2 or 3, and is Error: No hypotheses in n-best list?? when the beam size is 1. With this PR, normal translation occurs.

Also compare decode results with and without PR of models generated from marian-conv -f model.npz -t model.bin -g packed8avx2 and marian-conv -f model.npz -t model.bin -g packed16

Describe how you have tested your code, including OS and the cmake command. Linux cmake -DUSE_SENTENCEPIECE:BOOL=ON -DCOMPILE_CPU:BOOL=ON -DUSE_FBGEMM:BOOL=ON ..

Checklist

  • [x] I have tested the code manually
  • [x] I have read and followed CONTRIBUTING.md

robberlang avatar Oct 29 '20 19:10 robberlang

@emjotde @snukky This fix would resolve the issue. But, in general, this converter is a temporary solution as it depends on the string patterns in the weight names. We may want to have an input file to list the weights? In any case, I will not block this PR itself.

ykim362 avatar Sep 06 '22 19:09 ykim362