CTranslate2 icon indicating copy to clipboard operation
CTranslate2 copied to clipboard

Fast inference engine for Transformer models

Results 173 CTranslate2 issues
Sort by recently updated
recently updated
newest added

Improve print content of StorageView, limited with 6 values per line. Examples: ``` // Example 2: 2D Matrix (2D Tensor) { // Define the shape of the StorageView (12x12 matrix)...

After compiling manually (cmake and make with many switches and changes to CmakeList.txt file), Python kvetches that: ``` clang++: error: unknown argument: '-fno-openmp-implicit-rpath' clang++: error: unknown argument: '-fno-openmp-implicit-rpath' error: command...

Can any colleague help with the example of interference with the Gemma model in CTranslate2? Unfortunately, there is no information about this model in the documentation. Thx

Support quantization 4 bit with AWQ. There are 2 stable versions available: ``gemm`` and ``gemv``. Currently, I only add AWQ for Llama and Mistral converter. Other models could be added...

I'm planning to use `CTranslate2` from Rust with [ctranslate2-rs](https://github.com/jkawamoto/ctranslate2-rs) to create cross platform desktop app for translate multilang offline using `facebook/nllb-200-distilled-600M` I used the `ct2-transformers-converter` for convert it to ctranslate...

From my own experience in text generation models, I found out that quantizing the output and embed tensors to f16 and the other tensors to q6_k (or q5_k) gives smaller...

When I run the following converter script: `ct2-transformers-converter --model facebook/nllb-200-distilled-1.3B --quantization float16 --output_dir nllb-200-distilled-1.3B-ct2-float16 ` I now get the following error: `config.json: 100%|████████████████████████████████████████████████████████████████████| 808/808 [00:00

@minhthuc2502 @alexlnkp **Description** What type of cache is currently implemented in CTranslate2? Is it static or dynamic? Could we achieve a speed-up if the cache implementation is changed for the...

enhancement

Hi, Will it be possible to support: https://huggingface.co/collections/CohereForAI/c4ai-aya-23-664f4cda3fa1a30553b221dc ??? Thx

enhancement

import ctranslate2,psutil,os,transformers,time,torch generator = ctranslate2.Generator("/ct2opt-1.3b",tensor_parallel=True,device="cuda") tokenizer = transformers.AutoTokenizer.from_pretrained("facebook/opt-1.3b") def generate_text(text): for prompt in text: start_tokens = tokenizer.convert_ids_to_tokens(tokenizer.encode(prompt)) results = generator.generate_batch([start_tokens], max_length=30,include_prompt_in_result=False) output = tokenizer.decode(results[0].sequences_ids[0]) return output text = ["Hello, I...