caffe-jacinto Quantization

Hello,

I am working on both image classification examples (CIFAR/IMAGENET) and am struggling understanding where the quantization appears in your examples. Actually, i looked in the prototxt files provided in the sparsification phase but there is no quantization_param parameter in the convolutional/FC layers.

Jul 31 '17 10:07 Wronskia

Hi,

I hope you are referring to the examples given at: https://github.com/tidsp/caffe-jacinto-models

We recently found out that the best way to do quantization is to do it in the inference phase - without any special training for the quantization. That way, we an take any trained model and run inference with quantization.

We will soon add this feature to test/inference and you will be able set a flag at the test phase to enable inference with quantization.

However, we have not removed the feature of training with quantization - it is not used in the scripts since that's not the flow that is recommended currently.

I hope this clarifies.

Jul 31 '17 10:07 mathmanu

Hello, I was indeed refering to the examples in the caffe-jacinto-models repo.

Thank you for your quick answer,

Best,

Yassine

Jul 31 '17 11:07 Wronskia

Hello,

Actually, I need to do quantization in the training and not at inference. Do I have to link the parts related to quantization in the src codes ?

Thank you very much in advance,

Yassine

Aug 02 '17 07:08 Wronskia

You can use the flag quantization_start_iter in solver params. For example: quantization_start_iter: 2000 will start doing the quantization from iteration 2000 of training.

Training with quantization may not converge if it is not done correctly. I recommend to do it as a separate training stage after all your current training stages are complete.

But anyway as I said earlier, there is no significant advantage observed by doing quantization at training stage and hence it is better to do it at inference stage. But I leave that choice to you.

Aug 02 '17 08:08 mathmanu

Thank you for your answer.

I am trying to set up the parameters in such a way that every conv layer be quantized the following way : quantization_param { precision: MINIFLOAT mant_bits: 5 exp_bits: 2 } should I set it up in the train.prototxt file or in the solver ?

Aug 02 '17 09:08 Wronskia

You just have to define that only once on solver.prototxt. It will be added to all the necessary layers automatically. If you don't define that, DYNAMIC_FIXED_POINT will be assumed if you specify quantization_start_iter

Note: there is no significant advantage observed by doing quantization at training stage and hence it is better to do it at inference stage. I am adding this notice since you are trying out something that is not recommended, and I don't want others reading this thread to get confused.

Aug 02 '17 10:08 mathmanu

I have not been using training with quantization in this code for some time, so I am not completely sure if it works correctly. Instead, in our recommended flow, the inference engine takes care of quantization, without anything special done at the training stage.

Have you seen the package: https://github.com/pmgysel/caffe That may be a better source for you, if you really want to try training with quantization in caffe.

Aug 02 '17 10:08 mathmanu

Thank you for your answer, i will have a look at the link.

Aug 04 '17 09:08 Wronskia

Hello,

I would really like to perform quantization during the finetuning and sparsification using the same tool. I saw that in the caffe-0.16 version you enabled "quantize: true" which is defined in the test.prototxt to enable quantization at inference. However, I was wondering if you will release soon a version that does quantization at training ?

Thank you, Best

Oct 02 '17 12:10 Wronskia

Quantization at test stage is sufficient as the accuracy drop seems to be small. I have not tried training with quantization recently, as I am quite happy with the results that I get with quantization at test.

However, training with quantization is also supported. You can set quantize: true in your train.prototxt.

Note that you should probably enable this quantize flag only in a fine-tuning stage after finishing the initial round of training (otherwise it may not converge - not sure).

You may also want to write out the trained net in text format to see the quantization ranges and shift factors. You can refer to the function SnapshotToProtoLog() in the branch caffe-0.15 to understand how to do this. (I have not moved this function to caffe-0.16) as I consider this feature of training with quantization as deprecated.

Oct 02 '17 14:10 mathmanu

Hello manu,

Thank you for your answer.

I tried quantize:true and it works. However, i have two issues. 1/ I want to be able to set a bit precision to an arbitrary value for all layers. For example I want to be able to finetune my network with 16, 18 ,19 bits precision etc (for all layers) How could I set this ? 2/ I am wondering if in case I finetune a network using both sparsification and quantization at training, will the quantization affect the zero weights ?

Thank you very much, Best

Oct 06 '17 14:10 Wronskia

Notice the definition of the following in src/caffe/proto/caffe.proto

// Quantization params for the net
message NetQuantizationParameter { //frame/iter at which quantization is introduced
optional int32 quantization_start = 1 [default = 1];

//indicates whether quantized range is power of 2 or not. //if input is quantized with power2 range, the output quantized value can be obtained by shifting with fracbits optional bool power2_range = 12 [default = false];

optional QuantizationParameter_Precision precision = 2 [default = QuantizationParameter_Precision_DYNAMIC_FIXED_POINT];
optional QuantizationParameter_Rounding rounding_scheme = 3 [default = QuantizationParameter_Rounding_NEAREST];
optional uint32 bitwidth_activations = 4 [default = 8];

//8-bits doesn't seem to be working for classification if previous frame's range is used.
optional uint32 bitwidth_weights = 5 [default = 12];

optional bool quantize_weights = 8 [default = true]; optional bool quantize_activations = 9 [default = true];

optional int32 display_quantization = 10 [default = 2000]; optional bool insert_quantization_param = 11 [default = true];

//apply offset to make quantized range unsigned and optimal. optional bool apply_offset_activations = 13 [default = false]; optional bool apply_offset_weights = 14 [default = true]; }

These parameters control how the quantization is done for the entire net. If this is not specified, the default values take over. Notice that the default bitwidth for activations is 8 bits and for weights is 12 bits. However, you can specify different values as follows. Add the following at the beginning of your train.prototxt (if quantization is used in training) and test.prototxt (if quantization is used in test)

net_quantization_param { #specify the parameters that you want to change from its default value. #others need not be specified, although no harm is specifying. bitwidth_activations: 8 bitwidth_weights: 8

#STOCHASTIC rounding is recommended for training, #where as test should do NEAREST rounding for efficient realization rounding_scheme: QuantizationParameter_Rounding_STOCHASTIC }

Oct 06 '17 14:10 mathmanu

Applying quantization and sparsity together (or applying quantization on sparsified net) should work - but I did face some issues recently and have not looked carefully into that.

In test, to apply quantization and also measure the sparsity at the same time and see if you get the expected sparsity.

In your test command line argument, you can check the sparsity by adding --display_sparsity=1 as given in the examples in caffe-jacinto-models/trained/.../test/run.sh

I expect that you may face some issues here - since the code has changed recently and I have not spend time in validating all the cases. Let me know if you face issues.

Oct 06 '17 14:10 mathmanu

Thanks a lot :)))

Best.

Oct 06 '17 14:10 Wronskia

Keeping this issue open for some more time, as this is an interesting conversation and provides help to someone who wants to try the same.

Oct 06 '17 14:10 mathmanu

Sorry for the rushed close.

Oct 06 '17 14:10 Wronskia

No problem. Documentation is missing for some of these aspects - so these conversations help. Feel free to ask more questions if you have.

Oct 06 '17 15:10 mathmanu

If you see that sparsity is vanishing when you apply quantization, try setting these two flags to false and let me know: apply_offset_activations: false apply_offset_weights: false

Oct 06 '17 20:10 mathmanu

@mathmanu I am confused about the procedure of quantization at eval phase. As defined in /src/caffe/net.cpp, the dynamic frac-bits are set before forward. Does it mean the quantization ranges of inputs and outputs are involving with the last mini-batch? Or I am misunderstanding something? Can you briefly summarize the whole quantization procedure?

Mar 16 '18 04:03 jnjaby

My usage model: Training phase: Train using Caffe, within the NVIDIA DIGITS environment. Inference phase: Use the deploy.prototxt and .caffemodel file in a separate embedded environment (without Caffe)

Owing to the limited memory in the embedded environment, I would like the trained model to incorporate quantization. Is there a way to support quantization during training in DIGITS? DIGITS doesn't allow me to hand-edit the solver.prototxt file.

May 07 '18 12:05 pbbhat

hi, @mathmanu, thank you for this great work , if now i have a ResNet caffemodel which is trained with other caffe version, can i use your caffe do inference with quantize: true in test.prototxt to speed up? thanks in advance.

Jun 25 '18 13:06 liangzimei

caffe-jacinto caffe-jacinto copied to clipboard

Quantization

caffe-jacinto
caffe-jacinto copied to clipboard