Xavier Dupré
Xavier Dupré
I would update the converter to support external weights (see https://github.com/huggingface/optimum/issues/1642#issuecomment-1910294822).
Do you have an API in mind?
> We also need to add fp8 support for MatMulInteger to support dynamic quantization for fp8. The function defined by CUDA [cublasLtMatMul](https://docs.nvidia.com/cuda/cublas/index.html?highlight=cublasltmatmul#cublasltmatmul) allows more than one option for the output...
The only thing which wiuld require a larger consensus is the method i used to estimate the scale for float 8. Models are usually trained with float 8 and the...
DFT has two implementation a naive one (https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/cpu/signal/dft.cc#L185) and a faster one when the dimension is a power of 2. The naive one is used in this case. I checked...
I tried with a dummy set and it works. Maybe pandas changed the type of a column because one row is misaligned or for some other reason. ```python import numpy...
Feel free to contribute and choose the method you think is the best.
That would work.
If you are using a loop, it is not really suprising. There is no parallelization even though each row is processed independently.
One issue is StringNormalizer is defined in onnx. To change its behaviour, it has to be changed in onnx and onnxruntime. It is long. onnxruntime-extensions is a project implementing custom...