CTranslate2
CTranslate2 copied to clipboard
Fast inference engine for Transformer models
Convert the model (currently support opennmt-py) and save to the memory => inference without saving model. Customizing the wrapper to save the memory used. TODO: implementation for other converters
Is MPS support on the roadmap? I wanted to use faster-whisper on my Mac computer, but it is only using CPU.
Excellent work! Do you support the Qwen-1_8B-Chat model? Looking forward to your reply.
Hi, is there any way to extract the last hidden state (before lm_head dense layer) of the T5, GPT model ?. There are some kind of model need to take...
I've been examining the encoded output of whisper and I see that the results are different when the same input is sent in via batch or one-by-one. I made a...
Operaring system: Ubuntu 22.04.2, Python 3.10.6, CTranslate2 3.16. When exporting the bigscience/bloomz model using: _ct2-transformers-converter --force --model bigscience/bloomz --output_dir bloomz --quantization float16_ The conversion process works well for other bloom...
This PR adds quantized `Conv1D` inference on top of #1597. With previous `int8` quantization implementation, this quantized inference couldn't bring any speed up because quantization was bottleneck. To alleviate that,...
1. initial_prompt I use convert offical whisper model to CTranslate2 format,I can use “initial_prompt” normally. I convert my finetuned whisper model to CTranslate2 format, when i use “initial_prompt”, I get...
This PR adds armv7 support: * Implement slower generic versions of `div`, `mul_add`, `reduce_add`, `reduce_max`. * Rename `CT2_ARM64_BUILD` as `CT2_ARM_BUILD` Tested this on Galaxy S21 with library built for `armv7`,...
Hi, CTranslate2 uses oneDNN. oneDNN latest versions has [support for AMD GPU](https://github.com/oneapi-src/oneDNN/tree/master/src/gpu/amd). It [require Intel oneAPI DPC++](https://developer.codeplay.com/products/oneapi/amd/2023.0.0/guides/get-started-guide-amd). The same approach can [potentially enable NVIDIA GPU](https://developer.codeplay.com/products/oneapi/nvidia/2023.0.0/guides/get-started-guide-nvidia) support too. It would help...