CTranslate2
CTranslate2 copied to clipboard
Fast inference engine for Transformer models
Please help how can I use CTranslate2 with CUDA 10.2? The problem is in the Nvidia Jetson Nano does not work with CUDA 11 and 12(installation of cuda 11 or...
Baichuan2 is a generative model similar to llama, I find the following two differences: - 1. qkv merge as W_pack, so i change the file /ctranslate2/converters/transformers.py ![image](https://github.com/guillaumekln/faster-whisper/assets/122880585/f48aa58b-cc8a-4472-a27e-46ffd887afeb) - 2. rotary...
this is a Feature Request to implement custom 4D mask for Llama (and possibly any other model) similar to https://github.com/huggingface/transformers/pull/27539
Is this related to CTranslate? The following is copied from [this ](https://github.com/SYSTRAN/faster-whisper/issues/618) . I have made a test, for batching in faster-whisper. But faster_whisper batch encode consume multiple time as...
From https://github.com/guillaumekln/faster-whisper/issues/65 --- Some CPUs such as ARM Neoverse-N1 (Oracle Cloud free tier) support FP16 computation. It would be nice to have this feature because there could be up to...
I didn't see the documentation being updated regarding recent additions like distil-whisper and Mistral. It'd be nice to have that updatd as well as an example of each like the...
Hey everyone! I believe I have found a bug in the GEMM operator. To the best of my knowledge, the output shape of the `c` StorageView in the GEMM operator...
Hello, Can anyone help me with how to run core42/jais-13b-chat model with ctranslate2? I ran the conversion script but ran into error. Script used: ```ct2-transformers-converter --model core42/jais-13b-chat --quantization bfloat16 --output_dir...
I have spent time looking at the documentation but did not manage to find proper way to get the prediction probabilities of all tokens. Also, how can i get the...