OlivierDehaene

Results 119 comments of OlivierDehaene

That's very interesting thanks for sharing. It seems to be a bug in the code that generates the traceID. We never run into this issue on our side (since we...

Can you ellaborate on the issues you faced while running text-generation-inference? We have been running it since october in EKS, AzureML (which uses k8s as a backend) and K3S and...

@sam-h-bean, we can put you in contact with our experts in our [Expert Acceleration Program](https://huggingface.co/support) if you need any help setting up text-generation-inference in your production environment.

You can use the benchmarking tool to make sure that you don't OOM at a given setting and then use these settings at maximum values in the launcher.

All GPUs operations are asynchronous so this difference may be an artifact of this. See https://pytorch.org/tutorials/recipes/recipes/benchmark.html#pytorch-benchmark for example. It's expensive because in this case the tensor is of size `[N_TOKENS,...

This should not have hapened and is fixed on v1.2.2. The dockerfile was wrongly updated.

@jtourille, could you share your solution with the community here? Cheers!

> Sometimes it happens, sometimes it doesn't happen. It may be a problem with my operating environment. On the same server?

What's your usecase for these models? Their throughput is so low and the costs so prohibitive that I don't see any.

Super interesting! Can you try: ```yaml version: '3.4' services: multiminilml12v2: image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.0 environment: - MODEL_ID=sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 - NVIDIA_DISABLE_REQUIRE=1 - RUST_BACKTRACE=full - JSON_OUTPUT=true - PORT=18083 - MAX_BATCH_TOKENS=65536 - MAX_CLIENT_BATCH_SIZE=1024 # interesting variables...