Wagtail

Results 56 comments of Wagtail

In the long-term, I am looking into better translation support by LLMs like [the tower of unbabel](https://unbabel.com/de/announcing-tower-an-open-multilingual-llm-for-translation-related-tasks/). Though it takes additional steps till we have general models with this enhancement.

[NLLB models](https://huggingface.co/models?sort=trending&search=NLLB+distilled+ct2) are also supported by [cTranslate](https://opennmt.net/CTranslate2/guides/transformers.html#nllb). They support up to 200 languages but are a magnitude bigger.

Doesn't it work as decoder model? I have successfully run Mistral (with lots of redundant shortcuts). The architecture should be similar.

The code is similar to the GPT example in this repo: ``` from transformers import AutoTokenizer, AutoModel from bertviz import head_view from bertviz import model_view # load the model #...

@Icamd Does https://github.com/mattneary/attention work well with bigger GPTs? Do you know how the attention weights are aggregated into one view? @MarioRicoIbanez Can we use captum to view the attention pattern?

The Llama 3 model sinks most of the time all attention to the "begin of the text" token. It is possible to load the model in 4 or 8 quantizations...

You could zoom out in the browser. That isn't the perfect solution, though could work around in some use cases. A better solution could be collapsable sections.

There are also German and Russian datasets published in [clang8](https://github.com/google-research-datasets/clang8/issues/10).

Yes, how did you cite it?

Hey, thank you! Perhaps the text is listed in [Open Speech and Language Resources](http://www.openslr.org/25) ?