GLiNER Onnx converted model has slower inference

Onnx converted model has slower inference

Open yogitavm opened this issue 5 months ago • 3 comments

I finetuned gliner small v2.1 model and created onnx version of the same model using the convert_to_onnx.ipynb exmple code. When I compared the inference time of both models, the onnx version took 50% more time.

This is how I'm loading the model: model = GLiNER.from_pretrained(model_path, load_onnx_model=True, load_tokenizer=True)

Sep 17 '24 09:09 yogitavm

GLiNER GLiNER copied to clipboard

Onnx converted model has slower inference

GLiNER
GLiNER copied to clipboard