Alex

Results 25 comments of Alex

@VeeDel This issue - optimization for CPU inference. Perhaps you should find a similar issue or create a new one.

I noticed that the site now https://libretranslate.com/ old models of the Russian language, but the index has now been updated to the new v1.9, there are no such problems with...

I faced the same problem. Unfortunately python/ctranslate2/converters/opennmt_py.py currently only supports ALiBi or RoPe for decoder_type == "transformer_lm" (LLMs) and does not support for seq2seq. Also, unfortunately, there is no support...

Here are more specific numbers and observations: 1. Ultimately, the models with RPE (relative positional embeddings) and RoPE converged. 2. However, as I said earlier, the tok/s speed of both...

> there is no beta for silu I apologize here, I got a little mixed up. Beta meant the coefficient of the [Swish function](https://en.wikipedia.org/wiki/Swish_function). When beta = 1 it will...

> can you test to convert your onmt-py model with #1687 (the rope one with gated-gelu) and tell me if it works ? ``` Converting to ctranslate2 Traceback (most recent...

> > And there seems to be some confusion in the names. It seems gated silu/swish should be called SwiGLU, if I'm not mistaken. > > so did you try...

> > TypeError: MultiHeadAttentionSpec.**init**() got an unexpected keyword argument 'head_dim' > > this one does not make sense. check your /mnt/DeepLearning/Locomotive/venv/lib/python3.11/site-packages/ctranslate2/specs/attention_spec.py file and let me know it's up to date...

Yes, I think you're right! The models on which I saw such a difference differ quite significantly from the standard Transformer base. Here's a general description: 1. Effective batch size...

@vince62s I updated Ctranslate and the conversion went without errors. The only thing is that I saved the changes by adding the line "gated-gelu" to `_SUPPORTED_ACTIVATIONS: ` ``` _SUPPORTED_ACTIVATIONS =...