Nicolas Patry
Nicolas Patry
I added the LICENSE file.
Is it possibly you are on 32bit system : https://stackoverflow.com/questions/10561368/i-have-enough-memory-but-mmap-keeps-failing-saying-cannot-allocate-memory ? The tensors **are** lazily loaded, but here it seems mmap itself is failing. For `torch` (`pt`) we are not...
Hey, I can understand the frustration, sorry for that. The fallback exists, but for some reasons, there's a hardcrash when fallback executes on this particular model. But we should definitely...
Hi, I am not well versed in Deepspeed, but the error probably lies with them (or in your code and understanding of what's possible). ```python ValueError: Expected a dict of...
We should probably also stop running as root by default actually...
Okay this is the fix: ``` FROM tgi # Adds a non-root llm user to its own group for isolation ENV UID=1000 ENV USER=llm RUN groupadd -g "${UID}" "${USER}" &&...
If I use `* 1000` instead of `* 100` this is what I get on my small machine: ``` slow: 7.805477857589722 fast: 7.280818223953247 ``` In general we don't look too...
Sorry for the delay, this PR has been missed. We already support modifying everything through CLI normally : https://huggingface.co/docs/text-embeddings-inference/cli_arguments and this PR doesn't seem to be changing major functionality, no...
What does `Nw` mean ?
`return_full_text` is a legacy option linked to `transformers.pipelines` initial implementation (something like 4+ years ago). We had API dependencies on that behavior and therefore implemented here, it can mostly be...