fxmarty

Results 332 comments of fxmarty

Hello, Do you think the issue is related to running on Apple M1? Did you try to run on a x86_64 CPU? Could it be that PyTorch leverages the Apple...

I have never tried using onnxruntime with Apple devices, so that's why I am curious about it. I am not sure about the logits, I only wanted to check runtime...

Ok that's great! So apparently quantization on Apple M1 with onnxruntime is not that great. cc @mfuntowicz @hollance if you have any idea I run Optimum 1.2.3.dev0 (dev version from...

@lewisbails @hollance I think there are two very distinct issues here: 1/ runtime latency/throughput 2/ accuracy/other scores perfs In my example above I was only focusing on runtime, as I...

``` python run_glue.py --model_name_or_path philschmid/tiny-bert-sst2-distilled --task_name sst2 \ --quantization_approach static --calibration_method percentile \ --num_calibration_samples 104 --do_eval \ --output_dir /tmp/quantized_distilbert_sst2 --max_eval_samples 100 ``` works fine. So apparently we need the number...

Hello, I could not reproduce the issue with ``` torch==1.11.0+cu113 onnx==1.11.0 onnxruntime==1.11.1 optimum==main Python 3.9.12 ``` Could you try the following self-contained script? Run `python -m transformers.onnx output_dir -m transformers.onnx...

Closing this as https://github.com/huggingface/optimum/pull/271 was merged and include this option. Thanks @sam-h-bean for pointing it out, let us know if this solves your issue!

In my experience, **by default** onnxruntime use only physical cores, while PyTorch may use hyperthreading. For example, on my laptop with 10 physical cores (but 2 threads per core), `torch.get_num_threads()`...

https://github.com/huggingface/optimum/pull/271 merged, helps to fix the number of cores used

Hello @MiladMolazadeh , by coincidence I run into the same issue today! Would https://github.com/huggingface/optimum/pull/271 solve your issue? I propose the following workflow provided the above code is merged: ```python from...