fxmarty
fxmarty
I rebased following the changes to `ORTQuantizer`. As an example, we prompt for `howey/bert-base-uncased-sst2` ``` Model label mapping: {'LABEL_0': 0, 'LABEL_1': 1} Dataset label features: ClassLabel(num_classes=2, names=['negative', 'positive'], id=None) Could...
Running the same script on an AWS EC2 c6i instance gives: ``` cpu ['CPUExecutionProvider'] --- BATCH SIZE 1 --- PyTorch: 1.99 s ONNX Runtime: 5.56 s ORT 179.97 % slower...
To follow up on this, I ran the onnxruntime profiler on my model (on my laptop) to see what is taking so much time. Here's my finding, with batch size...
Thanks a lot for your help. Should I do the fusing by hand or is this an optimization proposed by onnxruntime? I could not find ressource on this in the...
This file is not accessible using Linux.
Awesome! I think it would be great to add tests, essentially that saving / reloading works well, in the encoder-only/encoder-decoder cases.
We should probably do the same in exporters actually
@NouamaneTazi Why not use actual >2GB models, initialized and saved random from transformers (so no download time)? So no need of custom logic.
> @fxmarty Yes definitely! I can use a randomly intialized model, but It seems there's no exposed API to load for example `ORTModelForSequenceClassification` from a `BertForSequenceClassification` instance? You can do...
Will work on this, first adding support for exporters.