CTranslate2 icon indicating copy to clipboard operation
CTranslate2 copied to clipboard

Fast inference engine for Transformer models

Results 173 CTranslate2 issues
Sort by recently updated
recently updated
newest added

As CTranslate2 now supports quantized 8-bit LLMs like OPT, are there any plans to include model parallelism to split a model layers across multiple GPUs or GPU+CPU to meet the...

enhancement

Hi, currently the decoder produces sentence level scores, instead of just outputting the average another option would be produce the score of each word/token. Beam search might be a harder...

enhancement

Hello. So, I want to run NLLB-200 (3.3B) model on a server with 4x 3090, and a say, 16 core AMD Epyc cpu. I wrapped Ctranslate2 in fastAPI, running with...

Hello Authors, I apologise for asking questions unrelated to an issue with the repo however, would you consider support a newer paradigm I came across whilst reading a recent [paper](https://www.researchgate.net/publication/367557918_Understanding_INT4_Quantization_for_Transformer_Models_Latency_Speedup_Composability_and_Failure_Cases)?...

enhancement

Hi, I want to use this lib to get encodings (on all positions) from flan T5 encoder on CPU. But I am not familiar with c++, so it is hard...

Context: With HF models, one can use [peft](https://github.com/huggingface/peft) to do parameter efficient tuning, the most popular (and afaik most performant) method being LoRa. Idea: It would be great to be...

enhancement