Sehoon Kim

Results 9 comments of Sehoon Kim
trafficstars

Thanks for your interest! First of all, HF and Fairseq (the current repo) are two different implementations for I-BERT and are independent from each others. You can use one of...

It is not restricted to specific tasks, so you can finetune it on your own task.

For quantization operations (e.g., QuantLinear), we normally use a round-to-nearest policy instead of floor as they are more sensitive to rounding errors. Rounding produces less error than floor in general....

We did not opensource our code for TensorRT deployment. We are planning to deploy our model using TVM which I think is a more suitable framework for an opensource project,...

Thanks for your interest! I should first mention that this PyTorch implementation of I-BERT only searches for the integer parameters (i.e., performs quantization-aware-training) that minimize the accuracy degradation as compared...

In my case, the NaNs issue was raised when there was no hook created ([here](https://github.com/AntixK/PyTorch-Model-Compare/blob/main/torch_cka/cka.py#L98)), in which case no feature would return [here](https://github.com/AntixK/PyTorch-Model-Compare/blob/main/torch_cka/cka.py#L165), resulting in the divided-by-zero error at L180....

Yes, the current implementation counts the averaged FLOPs across all the examples in the validation set

Hi, Yes, the pre-trained models like BERT and RoBERTa cannot be finetuned using longer sequence lengths than the maximum sequence length that they were pre-trained on as it will violate...