CTranslate2 icon indicating copy to clipboard operation
CTranslate2 copied to clipboard

Fast inference engine for Transformer models

Results 173 CTranslate2 issues
Sort by recently updated
recently updated
newest added

I am struggling to load a quantized model lacking sufficient CPU memory to load the weights. Usually I would split the weights up in multiple shards and then load them...

enhancement

Hi, Is there anyway to convert DeBERTa models to Ctranslate2 since the architecture is close somehow to BERT? Currently the conversion command ct2-transformers-converter produce this error: No conversion is registered...

enhancement

I am wondering if there exists any way to manually clear the cache for static prompt for generator.generate_tokens. We are running an algorithm where a lot of computations can be...

Hi. I have been doing some benchmarks on nvidia V100 32GB gpu. First, I benchmarked Llama2-7B-chat using huggingface transformers and CTranslate2. I saw reduced latency when using ct2 ( 12...

Hi, It would be great to support constraint generation, following the work of https://github.com/outlines-dev/outlines This should not be to much work, to integrate, but this should probably not be in...

enhancement

```py import torch import ctranslate2 x0 = torch.ones((2, 4), dtype=torch.int32, device="cuda:0") y0 = ctranslate2.StorageView.from_array(x0) print(f"Original tensor is on {x0.device} and StorageView is on {y0.device}:{y0.device_index}") x1 = torch.ones((2, 4), dtype=torch.int32, device="cuda:1")...

enhancement

Do you have any plans to support ELECTRA models? If there are any contributions that I can do to help with this, I will be glad to help. Thank you

enhancement

Model: Llama-2-7b-chat CT2 version: 3.19.0 I found that when I use a int8* quantization, the inference speed drastically depends on the num_hypotheses. I tried to benchmark my model with a...

New UMT5 models from Google are currently the most interesting variation of the original T5s. However, trying to convert a UMT5 model using the transformers converter by running: `ct2-transformers-converter --model...

This could be used for LLMs and hopefully for encoder-decoder models like using the smaller NLLB coupled with the bigger NLLB models

enhancement