CTranslate2 issues

Loading model on low CPU memory

5

I am struggling to load a quantized model lacking sufficient CPU memory to load the weights. Usually I would split the weights up in multiple shards and then load them...

barschiiii

enhancement

Support for DeBERTa Models

2

Hi, Is there anyway to convert DeBERTa models to Ctranslate2 since the architecture is close somehow to BERT? Currently the conversion command ct2-transformers-converter produce this error: No conversion is registered...

sharhabeel

enhancement

Any way to manually clear the cache for static prompt for generator.generate_tokens?

3

I am wondering if there exists any way to manually clear the cache for static prompt for generator.generate_tokens. We are running an algorithm where a lot of computations can be...

waterhorse1

Weird behavior on V100 32GB

3

Hi. I have been doing some benchmarks on nvidia V100 32GB gpu. First, I benchmarked Llama2-7B-chat using huggingface transformers and CTranslate2. I saw reduced latency when using ct2 ( 12...

AmgadHasan

Support Constraint Generation

1

Hi, It would be great to support constraint generation, following the work of https://github.com/outlines-dev/outlines This should not be to much work, to integrate, but this should probably not be in...

jgcb00

enhancement

Incorrect device index in `StorageView`

2

```py import torch import ctranslate2 x0 = torch.ones((2, 4), dtype=torch.int32, device="cuda:0") y0 = ctranslate2.StorageView.from_array(x0) print(f"Original tensor is on {x0.device} and StorageView is on {y0.device}:{y0.device_index}") x1 = torch.ones((2, 4), dtype=torch.int32, device="cuda:1")...

sumitupadhye12

enhancement

Support for ELECTRA models

1

Do you have any plans to support ELECTRA models? If there are any contributions that I can do to help with this, I will be glad to help. Thank you

Hazqeel09

enhancement

Weird speed behavior int8* quantization

2

Model: Llama-2-7b-chat CT2 version: 3.19.0 I found that when I use a int8* quantization, the inference speed drastically depends on the num_hypotheses. I tried to benchmark my model with a...

b-joris

Support for UMT5

2

New UMT5 models from Google are currently the most interesting variation of the original T5s. However, trying to convert a UMT5 model using the transformers converter by running: `ct2-transformers-converter --model...

QLutz

Support Speculative Decoding

5

This could be used for LLMs and hopefully for encoder-decoder models like using the smaller NLLB coupled with the bigger NLLB models

JOHW85

enhancement

CTranslate2
CTranslate2 copied to clipboard

Metadata

Loading model on low CPU memory

Support for DeBERTa Models

Any way to manually clear the cache for static prompt for generator.generate_tokens?

Weird behavior on V100 32GB

Support Constraint Generation

Incorrect device index in `StorageView`

Support for ELECTRA models

Weird speed behavior int8* quantization

Support for UMT5

Support Speculative Decoding

← Metadata

Owner

Metadata

CTranslate2 CTranslate2 copied to clipboard

Metadata

← Metadata

Owner

Metadata

CTranslate2
CTranslate2 copied to clipboard