marian-dev
marian-dev copied to clipboard
Error in decoding using packed binary model (packed8avx512)
Bug description
A fatal error occurs when decoding with a model that was converted to packed8avx512
GEMM type from a .npz formatted model using marian-conv
. I have a few models where this happens, and I also have models where it does not.
The error message is Error: Actual pathScore (-inf) is lower than INVALID_PATH_SCORE (-3.40282e+38)??
when the beam size is 2 or 3, and is Error: No hypotheses in n-best list??
when the beam size is 1.
How to reproduce
marian-conv -f model.npz -t model.bin -g packed8avx512
echo 'test' | marian-decoder -b <beam-size>
--cpu-threads 1 -m model.bin -v vocab.src.spm vocab.trg.spm
Context
-
Marian version: v1.9.28; b28905a2 2020-07-21 11:32:08 +0100
-
CMake command:
cmake -DUSE_SENTENCEPIECE:BOOL=ON -DCOMPILE_CPU:BOOL=ON -DCOMPILE_CUDA:BOOL=OFF -DUSE_CUDNN:BOOL=OFF -DUSE_FBGEMM:BOOL=ON -DUSE_NCCL:BOOL=OFF ..
Output of--build-info all
: marian-build-info-all.txt -
Log file: decode-b1.log decode-b2.log
No problems when using a model converted to the float32
type with marian-conv
.
Whoa. That's really odd. Can you share the model by any chance?
You should have received an email with info for getting the model and vocab files. Thanks.
Got the e-mail, thanks.
A bit late, but.. Could it be that you have an avx2
machine? Does packed8avx2
work for you?
The machine I'm using supports AVX-512. Trying with packed8avx2 gives this: Error: FBGEMM doesn't allow to use AVX2 packing order on AVX512 CPUs
and Error: Aborted from void marian::cpu::variant::fbgemmPacked8Gemm(marian::Tensor, marian::Tensor, marian::Tensor, size_t, size_t, size_t, int, int) in src/tensors/cpu/fbgemm/packed_gemm.cpp:558
I also gave a try with building Marian with the latest upstream FBGEMM, but got the same results.
I've figured it out. The problem is that marian-conv
is quantizing decoder_ff_logit_out_Wt
, and the reason it does that, the reason that that parameter exists, is that I had trained the model with tied-embeddings
and tied-embeddings-all
both set to false
. If I modify ExpressionGraphPackable::packAndSave
in src/tensors/cpu/fbgemm/expression_graph_packable.h
to exclude quantizing decoder_ff_logit_out_Wt
, then all is well.