Guillaume Klein
Guillaume Klein
# What does this PR do? The lang tokens were missing from `M2M100Tokenizer.get_vocab`. The `get_vocab` method is updated to match other multilingual tokenizers such as `NllbTokenizer` and `MBart50Tokenizer`. ## Before...
# Summary https://github.com/intel/mkl-dnn/commit/274be8228a0dba6391c2769c37cd68a3bb730fbf added AVX2 optimizations for igemm kernels (as discussed in https://github.com/intel/mkl-dnn/issues/532). However, the execution appears to be 1.4x slower than using version v0.21 compiled with Intel MKL. In...
Fairseq recently released a new version 0.12.1 to PyPI. This version is breaking the conversion of M2M-100 which fails with the following error: ```text Traceback (most recent call last): File...
The binary operators: * `ops::Add` * `ops::Mul` * `ops::Sub` currently do not support broadcasting. One should instead call the lower level primitives such as `add_depth_broadcast` which require device and type...
The MatMul API from [cublasLt](https://docs.nvidia.com/cuda/cublas/index.html#using-the-cublasLt-api) can be configured to also add the bias and apply ReLU. We should look into this.
The GEMM backend is selected at runtime depending on the requested compute type and CPU information. The dispatch to the correct implementation is done with a switch statement: https://github.com/OpenNMT/CTranslate2/blob/3f6ac9cb22528c4b17b65783811f795ac6a85538/src/cpu/primitives.cc#L533-L612 This...
The dequantization of GEMM output on CPU is currently not vectorized: https://github.com/OpenNMT/CTranslate2/blob/v1.17.0/src/ops/dequantize_cpu.cc The performance could be slightly improved by vectorizing this operation and fusing bias addition and ReLU. This is...
Similarly to the recent CTranslate2 work (https://github.com/OpenNMT/CTranslate2/pull/769), we should publish ARM64 wheels for macOS. I had a first look but did not immediately find the correct configuration to cross-compile ICU...