Breno Faria

Results 67 comments of Breno Faria

@model-collapse my use-case is the same as @reuschling's. I'm aware that running larger models in OpenSearch might not be the best approach, but there are plenty of small asymmetric models...

I have a PR ready that will allow you to use the neural search plugin with an e5 family model. As soon as https://github.com/opensearch-project/ml-commons/pull/2318 gets merged I will open it...

@reuschling @model-collapse I have opened the PR (#710) mentioned above.

This would be a great step into "productifying" this plugin. Is there already an idea about what criteria the benchmarking corpora should fulfill? E.g. size, domain, nature of queries (keywords,...

I think the decision to make V1 the default before it reaches feature parity was ill taken.

I have observed the same issue while load testing 0.6.0. I also have observed the error when the GPU KV cache usage was close to 100%. I'm not sure there...

I have now observed another exception: ``` Sep 06 10:53:54 hal9000 docker[931748]: ERROR 09-06 01:53:54 async_llm_engine.py:63] Engine background task failed Sep 06 10:53:54 hal9000 docker[931748]: ERROR 09-06 01:53:54 async_llm_engine.py:63] Traceback...

Also, adding `--disable-frontend-multiprocessing` does not work around this issue.

Neither does increasing `VLLM_RPC_GET_DATA_TIMEOUT_MS` or `VLLM_ENGINE_ITERATION_TIMEOUT_S`.

I really appreciate all the work being done in all attempts so far (#2488, #3237, #4656)! I've been waiting now already months for this... I'd like to make a suggestion...