Stefano Luoni

Results 4 comments of Stefano Luoni

I have the same problem with jina-embeddings-v3. I never send batches of more than 8/10 strings, and never exceed 8k tokens per request. However, when the model is hit with...

After some further tests, it looks setting batch-size = 4 prevents the memory to grow indefinitely. This, however, comes with a performance degradation, roughly of 50% of requests processed per...

I was initially thinking to pass a parameter `late_chunking` with the embedding request, but since such a parameter does not exist in the OpenAI specification, I would instead add an...

I experienced the same issue with standard Llama models from Meta as well (3.1 70B Instruct, and 3.3 70B Instruct). These models are hosted in my corporate infrastructure and usually...