Sehoon Kim comments

Results 9 comments of


                                            Sehoon Kim

trafficstars

Quantization on trained model

Thanks for your interest! First of all, HF and Fairseq (the current repo) are two different implementations for I-BERT and are independent from each others. You can use one of...

Quantization on trained model

It is not restricted to specific tasks, so you can finetune it on your own task.

rationale considering in using floor or round

For quantization operations (e.g., QuantLinear), we normally use a round-to-nearest policy instead of floor as they are more sensitive to rounding errors. Rounding produces less error than floor in general....

Missing deployment part on TensorRt

We did not opensource our code for TensorRT deployment. We are planning to deploy our model using TVM which I think is a more suitable framework for an opensource project,...

Can use the CPU in the inference state?

Thanks for your interest! I should first mention that this PyTorch implementation of I-BERT only searches for the integer parameters (i.e., performs quantization-aware-training) that minimize the accuracy degradation as compared...

Works fine with the whole model but raise "NANs" on selected layers.

In my case, the NaNs issue was raised when there was no hook created ([here](https://github.com/AntixK/PyTorch-Model-Compare/blob/main/torch_cka/cka.py#L98)), in which case no feature would return [here](https://github.com/AntixK/PyTorch-Model-Compare/blob/main/torch_cka/cka.py#L165), resulting in the divided-by-zero error at L180....

Sehoon Kim

Quantization on trained model

Quantization on trained model

rationale considering in using floor or round

Missing deployment part on TensorRt

Can use the CPU in the inference state?

Works fine with the whole model but raise "NANs" on selected layers.

FLOPs

Where to get the pretrained model with max-seq-length over 512?

question about the max seq length