CTranslate2 issues

Results 173 CTranslate2 issues

Sort by recently updated

[*] improve log message for storage view content

Improve print content of StorageView, limited with 6 values per line. Examples: ``` // Example 2: 2D Matrix (2D Tensor) { // Define the shape of the StorageView (12x12 matrix)...

tirivo

Clang unusual switches wrongly hardcoded in resulting setup.py

After compiling manually (cmake and make with many switches and changes to CmakeList.txt file), Python kvetches that: ``` clang++: error: unknown argument: '-fno-openmp-implicit-rpath' clang++: error: unknown argument: '-fno-openmp-implicit-rpath' error: command...

Manamama

Gemma model - help needed

Can any colleague help with the example of interference with the Gemma model in CTranslate2? Unfortunately, there is no information about this model in the documentation. Thx

carolinaxxxxx

Quantzation AWQ GEMM + GEMV

Support quantization 4 bit with AWQ. There are 2 stable versions available: ``gemm`` and ``gemv``. Currently, I only add AWQ for Llama and Mistral converter. Other models could be added...

minhthuc2502

Download ready to use model

I'm planning to use `CTranslate2` from Rust with [ctranslate2-rs](https://github.com/jkawamoto/ctranslate2-rs) to create cross platform desktop app for translate multilang offline using `facebook/nllb-200-distilled-600M` I used the `ct2-transformers-converter` for convert it to ctranslate...

thewh1teagle

[feature request] Mixed quantizations.

From my own experience in text generation models, I found out that quantizing the output and embed tensors to f16 and the other tensors to q6_k (or q5_k) gives smaller...

0wwafa

Converter not working for NLLB models

When I run the following converter script: `ct2-transformers-converter --model facebook/nllb-200-distilled-1.3B --quantization float16 --output_dir nllb-200-distilled-1.3B-ct2-float16 ` I now get the following error: `config.json: 100%|████████████████████████████████████████████████████████████████████| 808/808 [00:00

marcelgoya

Feature Request: Implement Static Cache and Quantization Techniques in CTranslate2

@minhthuc2502 @alexlnkp **Description** What type of cache is currently implemented in CTranslate2? Is it static or dynamic? Could we achieve a speed-up if the cache implementation is changed for the...

Jiltseb

enhancement

Feature Request: Support for CohereForAI AYA-23 models

Hi, Will it be possible to support: https://huggingface.co/collections/CohereForAI/c4ai-aya-23-664f4cda3fa1a30553b221dc ??? Thx

carolinaxxxxx

enhancement

Inference for ctranslate2 using tensor parallel with mpi

import ctranslate2,psutil,os,transformers,time,torch generator = ctranslate2.Generator("/ct2opt-1.3b",tensor_parallel=True,device="cuda") tokenizer = transformers.AutoTokenizer.from_pretrained("facebook/opt-1.3b") def generate_text(text): for prompt in text: start_tokens = tokenizer.convert_ids_to_tokens(tokenizer.encode(prompt)) results = generator.generate_batch([start_tokens], max_length=30,include_prompt_in_result=False) output = tokenizer.decode(results[0].sequences_ids[0]) return output text = ["Hello, I...

mohith56

CTranslate2
CTranslate2 copied to clipboard

Metadata

[*] improve log message for storage view content

Clang unusual switches wrongly hardcoded in resulting setup.py

Gemma model - help needed

Quantzation AWQ GEMM + GEMV

Download ready to use model

[feature request] Mixed quantizations.

Converter not working for NLLB models

Feature Request: Implement Static Cache and Quantization Techniques in CTranslate2

Feature Request: Support for CohereForAI AYA-23 models

Inference for ctranslate2 using tensor parallel with mpi

← Metadata

Owner

Metadata

CTranslate2 CTranslate2 copied to clipboard

Metadata

← Metadata

Owner

Metadata

CTranslate2
CTranslate2 copied to clipboard