CTranslate2 issues

Results 173 CTranslate2 issues

Sort by recently updated

Enabling oneDNN and OpenBLAS backends at the same time breaks downstream applications

When oneDNN and OpenBLAS are enabled the `wyoming-faster-whisper` does not start and works only with OpenBLAS enabled: https://github.com/NixOS/nixpkgs/pull/237788#issuecomment-1592070847 On the other hand, the `libretranslate` also breaks when oneDNN and OpenBLAS...

misuzu

Quantization leads to performance degradation

When converting the model, I enable the quantization to 'int8', but I noticed a decrease in performance of the converted model by 5 points in terms of BLEU. Therefore, I...

baoguo1995

Stop words (sequence)

Support adding stop words as sequence of tokens. For example, in python code generation, it's fairly common to adding following tokens as stop words: ```py stop_words = [ "\ndef", "\nclass",...

wsxiaoys

enhancement

Manually destroy cuBLAS and cuDNN handles before threads exit

guillaumekln

Use non blocking CUDA streams

guillaumekln

Resume model execution from where it stopped

DialoGPT reads a user input and then generates text until EOS token uppears But for DialoGPT EOS can uppear multiple times And after every responce to keep the context you...

NeonBohdan

enhancement

Assisted Generation feature

Assisted Generation feature from https://huggingface.co/blog/assisted-generation Seems improve a lot on quantized inference latency

wsxiaoys

enhancement

Support for CodeT5pConfig

A new family of Salesforce coding models has been released https://huggingface.co/Salesforce/codet5p-2b/ Context: [CodeT5+](https://github.com/salesforce/CodeT5/tree/main/CodeT5+) is a new family of open code large language models with an encoder-decoder architecture that can flexibly...

ferboz

enhancement

How to trans a model with Parallel encoder

framework： OpenNMT-tf model： ``` class TinyDualSourceTransformer(onmt.models.Transformer): def __init__(self): super(TinyDualSourceTransformer, self).__init__( source_inputter=onmt.inputters.ParallelInputter([ onmt.inputters.WordEmbedder(embedding_size=256), onmt.inputters.WordEmbedder(embedding_size=256)]), target_inputter=onmt.inputters.WordEmbedder(embedding_size=256), num_layers=4, num_units=128, num_heads=4, ffn_inner_dim=512, dropout=0.1, attention_dropout=0.1, ffn_dropout=0.1, share_encoders=True) def auto_config(self, num_replicas=1): config = super(TinyDualSourceTransformer, self).auto_config(num_replicas=num_replicas) max_length...

wangshauitj

enhancement

Different generation parameters in the same batch

Hello team, Today, batch generation works like the HF generate() function: it accepts several input texts but generation parameters (like temperature, top k, etc.) apply to the whole batch, so...

juliensalinas

enhancement

CTranslate2
CTranslate2 copied to clipboard

Metadata

Enabling oneDNN and OpenBLAS backends at the same time breaks downstream applications

Quantization leads to performance degradation

Stop words (sequence)

Manually destroy cuBLAS and cuDNN handles before threads exit

Use non blocking CUDA streams

Resume model execution from where it stopped

Assisted Generation feature

Support for CodeT5pConfig

How to trans a model with Parallel encoder

Different generation parameters in the same batch

← Metadata

Owner

Metadata

CTranslate2 CTranslate2 copied to clipboard

Metadata

← Metadata

Owner

Metadata

CTranslate2
CTranslate2 copied to clipboard