red thing
red thing
Where is iree-torch?
I still have this issue.
I am having this issue too.
See related technique: https://github.com/ggerganov/llama.cpp/issues/528
The true context window of transformers is effectively baked in to the model. It's an inherent property, and it's tied to how the model learns positional embeddings and all of...
There's bert.cpp if you want to run BERT models. But everything has architecture specifics so you have to wait for someone to write a layer/inference implementation. It's actually quite complicated...
No, there is no "documentation", the closest thing is just the GGML examples. They are not really simple.
Since [ggerganov said](https://github.com/ggerganov/ggml/pull/12#issuecomment-1542871693) this is no longer a priority, I recommend [CTranslate2](https://github.com/OpenNMT/CTranslate2) to do fast Python-free CPU inference of T5 family models.
I have the same issue.
Seeing this as well