Lewis Bails comments

Results 8 comments of


                                            Lewis Bails

Add spacy-transformer wrapper model to load danish BERT form botxo

Hey @AmaliePauli, I see you're using the BotXO weights for your BertTone model. Is that the version 1 or version 2 representations? https://github.com/botxo/nordic_bert

Quantisation of BigBirdForTokenClassification suffers significant performance drop

@deanjones if you're interested in following this.

Quantisation of BigBirdForTokenClassification suffers significant performance drop

Thanks for that @fxmarty. That could certainly be the case for why PyTorch inference is faster on my machine! But regarding the ORT model performance, your models seem much quicker...

Quantisation of BigBirdForTokenClassification suffers significant performance drop

I had to make a few tweaks to your script to get around some errors that were popping up. Are you using `optimum==1.3.0`? These were my results on M1: ```...

Quantisation of BigBirdForTokenClassification suffers significant performance drop

Also, I had to go up to `atol=3` to get the logits comparison between the vanilla ONNX model and ONNX-quantized model to pass. Seems large, but I'm not familiar enough...

Quantisation of BigBirdForTokenClassification suffers significant performance drop

Running it again with the random input ids: ```python (Min, Max) PyTorch: ( -3.349, 3.752) (Min, Max) ONNX Runtime: (-3.32, 3.737) (Min, Max) ONNX Runtime quantized: (-5.626, 3.52) ```

Quantisation of BigBirdForTokenClassification suffers significant performance drop

I didn't explicitly send it to the Neural Engine / M1 GPU, do you know if this is something that happens under the hood?

Only channel and sequence concatenation are supported.

For those that are still looking this up in the future. I managed to get it working by reshaping my tensors, concatenating them along the coreml-compliant dimension (in my case,...