Jaehong Kim comments

Results 22 comments of


                                            Jaehong Kim

quantization not happening?

QAT model is only for training. So the weights for QAT model is a float form. For 8 bit quantization, we do actual quantization during TFLite conversion. (inference also just...

Support for nested `tf.keras.layers.Layer`

We still not fully supports subclass model yet. The model we mentioned on that blog post means we wrote some code to supports a kind of subclass model, which implemented...

How to use default n bit in QAT ?

We have a experimental n-bit quantization scheme here: https://github.com/tensorflow/model-optimization/tree/master/tensorflow_model_optimization/python/core/quantization/keras/experimental/default_n_bit We have a plan to release an example that shows how to use this default_n_bit scheme on tensorflow official vision model....

Trying to quantise MobileNetv3 small Exception encountered when calling layer "tf.operators.add_137"

Hi the model input shapes are (224, 224, 3), so you can try size=(1, 224, 224, 3) on representative_dataset function. Thanks.

Set zero_debias=True for Quantization Aware Training

Would you please provide more details? Did you try to use TF1 QAT on TF1 model but it raise that error?

perform inference after QAT

If you run inference for a QAT model, it already simulate 4bits with fake-quant. Only difference is we use float32 op. Basically, input and weight is quant-dequant by injected fake-quant....

Input and resource quantization

Hi, would you please give us some examples? We usually assume BNs would be folded (fused) to nearby layer for optimization. I'd like to know some use-cases that when it...

Unable to use strip_pruning for subclass model

AFAIK evenif you apply prune_low_magnitude to sublayers of your subclass model, the pruning scheduling logic (callback based) during the training is not working due to it can't find your pruned...

Issue for the FAKE_QUANT result

It looks has some rounding computational numerical issue. (e.g. zero_point_from_min=127.5, but sometimes zero_point_from_min=127.49999) AFAIK, fake_quant op implementation has some numerical error: [as-is] (clamped_shifted / nudged_scale_repl + 0.5f).floor() * nudged_scale_repl +...

QAT for subclass inside the subclass

Hi, Would you please explain how did you make Q_quant more details? Is that quantized if you only use B_quant class?