students what is the right way to get a 8-bit model?

1、which version or branch of marian to complie? marian-master， marian-dev or https://github.com/afaji/Marian/tree/fixed-quant? 2、could the teacher model trained by teh marian-master?

Apr 06 '21 09:04 Sarah-Callies

fixed-quant is dead since it's been merged into master and @afaji needs to pay attention to issue #24 to remove references to it.

You can get a slower 8-bit model in https://github.com/marian-nmt/marian-dev without output matrix quantization or get a faster 8-bit model from https://github.com/browsermt/marian-dev .

The teacher can be trained with anything.

Apr 06 '21 10:04 kpu

you mean i can get a fixed-quant(16-bit or 8-bit) model by adding the command "--quantize-bits 16" using marian master

Apr 06 '21 10:04 Sarah-Callies

@XapaJIaMnu the 8-bit documentation is lacking.

Apr 06 '21 10:04 kpu

@yandaowei sorry for the lacking documentation, could you please check the steps described here: https://github.com/browsermt/students/tree/master/train-student#5-optional-8bit-quantization and see if everything is clear.

The quantisation finetuning is completely optional and is described here https://github.com/browsermt/students/tree/master/train-student/finetune Basically what it does it takes a fp32 model and damages it in such a way that would correspond to a quantised model. It achieves this by limiting the GEMM outputs to only 255 distinct numbers (in the 8bit case) or 65535 distinct numbers (in the 32bit case). When training for a bit with this scheme the model learns to work with reduced set of distinct values and performs better when quantised.

Apr 06 '21 20:04 XapaJIaMnu

So I need two step to get the 8-bit model : Step1: do as the doc "https://github.com/browsermt/students/tree/master/train-student#5-optional-8bit-quantization" described to get the 8-bit model Step2: finetune the 8-bit model as "https://github.com/browsermt/students/tree/master/train-student/finetune" described is that correct?

Apr 07 '21 02:04 Sarah-Callies

Train an FP32 model as usual.
Optional: finetune the FP32 model with 8-bit damage. This step is mostly only useful if the model is particularly small (on the order of tiny11 in our WNGT paper). If it's larger, then rounding to 8-bit works out of the box.
Optional: if using any of the 8-bit quantization methods that pre-compute the scaling factor for activations (these contain Alpha in the command line for the next step), run a sample decode through to tune the scaling factor.
Convert to 8-bit format.

Apr 07 '21 08:04 kpu