CTranslate2 issues

Splitting LLM layers across multiple GPUs

3

As CTranslate2 now supports quantized 8-bit LLMs like OPT, are there any plans to include model parallelism to split a model layers across multiple GPUs or GPU+CPU to meet the...

JOHW85

enhancement

Feature request: Word level scores

11

Hi, currently the decoder produces sentence level scores, instead of just outputting the average another option would be produce the score of each word/token. Beam search might be a harder...

ItakeLs

enhancement

Ideas for better performance

12

Hello. So, I want to run NLLB-200 (3.3B) model on a server with 4x 3090, and a say, 16 core AMD Epyc cpu. I wrapped Ctranslate2 in fastAPI, running with...

hobodrifterdavid

Hello Authors, I apologise for asking questions unrelated to an issue with the repo however, would you consider support a newer paradigm I came across whilst reading a recent [paper](https://www.researchgate.net/publication/367557918_Understanding_INT4_Quantization_for_Transformer_Models_Latency_Speedup_Composability_and_Failure_Cases)?...

fmac2000

enhancement

Accept left offsets when applying position encodings

Related to #1349.

guillaumekln

Accept left offsets in the rotary embeddings layer

Related to #1349.

guillaumekln

Accept left offsets in the masked softmax operator

Related to #1349.

guillaumekln

Accept variable-length batch prompts for Whisper

guillaumekln

Get encoding from flan T5

1

Hi, I want to use this lib to get encodings (on all positions) from flan T5 encoder on CPU. But I am not familiar with c++, so it is hard...

Alexander-Jin

Support peft's LoRa for HF transformer models.

4

Context: With HF models, one can use [peft](https://github.com/huggingface/peft) to do parameter efficient tuning, the most popular (and afaik most performant) method being LoRa. Idea: It would be great to be...

Palmik

enhancement

CTranslate2
CTranslate2 copied to clipboard

Metadata

Splitting LLM layers across multiple GPUs

Feature request: Word level scores

Ideas for better performance

Int4 Support

Accept left offsets when applying position encodings

Accept left offsets in the rotary embeddings layer

Accept left offsets in the masked softmax operator

Accept variable-length batch prompts for Whisper

Get encoding from flan T5

Support peft's LoRa for HF transformer models.

← Metadata

Owner

Metadata

CTranslate2 CTranslate2 copied to clipboard

Metadata

← Metadata

Owner

Metadata

CTranslate2
CTranslate2 copied to clipboard