Lewis Bails

Results 8 comments of Lewis Bails

Hey @AmaliePauli, I see you're using the BotXO weights for your BertTone model. Is that the version 1 or version 2 representations? https://github.com/botxo/nordic_bert

Thanks for that @fxmarty. That could certainly be the case for why PyTorch inference is faster on my machine! But regarding the ORT model performance, your models seem much quicker...

I had to make a few tweaks to your script to get around some errors that were popping up. Are you using `optimum==1.3.0`? These were my results on M1: ```...

Also, I had to go up to `atol=3` to get the logits comparison between the vanilla ONNX model and ONNX-quantized model to pass. Seems large, but I'm not familiar enough...

Running it again with the random input ids: ```python (Min, Max) PyTorch: ( -3.349, 3.752) (Min, Max) ONNX Runtime: (-3.32, 3.737) (Min, Max) ONNX Runtime quantized: (-5.626, 3.52) ```

I didn't explicitly send it to the Neural Engine / M1 GPU, do you know if this is something that happens under the hood?

For those that are still looking this up in the future. I managed to get it working by reshaping my tensors, concatenating them along the coreml-compliant dimension (in my case,...