CTranslate2
CTranslate2 copied to clipboard
Fast inference engine for Transformer models
Improve print content of StorageView, limited with 6 values per line. Examples: ``` // Example 2: 2D Matrix (2D Tensor) { // Define the shape of the StorageView (12x12 matrix)...
After compiling manually (cmake and make with many switches and changes to CmakeList.txt file), Python kvetches that: ``` clang++: error: unknown argument: '-fno-openmp-implicit-rpath' clang++: error: unknown argument: '-fno-openmp-implicit-rpath' error: command...
Can any colleague help with the example of interference with the Gemma model in CTranslate2? Unfortunately, there is no information about this model in the documentation. Thx
Support quantization 4 bit with AWQ. There are 2 stable versions available: ``gemm`` and ``gemv``. Currently, I only add AWQ for Llama and Mistral converter. Other models could be added...
I'm planning to use `CTranslate2` from Rust with [ctranslate2-rs](https://github.com/jkawamoto/ctranslate2-rs) to create cross platform desktop app for translate multilang offline using `facebook/nllb-200-distilled-600M` I used the `ct2-transformers-converter` for convert it to ctranslate...
From my own experience in text generation models, I found out that quantizing the output and embed tensors to f16 and the other tensors to q6_k (or q5_k) gives smaller...
When I run the following converter script: `ct2-transformers-converter --model facebook/nllb-200-distilled-1.3B --quantization float16 --output_dir nllb-200-distilled-1.3B-ct2-float16 ` I now get the following error: `config.json: 100%|████████████████████████████████████████████████████████████████████| 808/808 [00:00
@minhthuc2502 @alexlnkp **Description** What type of cache is currently implemented in CTranslate2? Is it static or dynamic? Could we achieve a speed-up if the cache implementation is changed for the...
Hi, Will it be possible to support: https://huggingface.co/collections/CohereForAI/c4ai-aya-23-664f4cda3fa1a30553b221dc ??? Thx
import ctranslate2,psutil,os,transformers,time,torch generator = ctranslate2.Generator("/ct2opt-1.3b",tensor_parallel=True,device="cuda") tokenizer = transformers.AutoTokenizer.from_pretrained("facebook/opt-1.3b") def generate_text(text): for prompt in text: start_tokens = tokenizer.convert_ids_to_tokens(tokenizer.encode(prompt)) results = generator.generate_batch([start_tokens], max_length=30,include_prompt_in_result=False) output = tokenizer.decode(results[0].sequences_ids[0]) return output text = ["Hello, I...