Breno Faria
Breno Faria
@model-collapse my use-case is the same as @reuschling's. I'm aware that running larger models in OpenSearch might not be the best approach, but there are plenty of small asymmetric models...
I have a PR ready that will allow you to use the neural search plugin with an e5 family model. As soon as https://github.com/opensearch-project/ml-commons/pull/2318 gets merged I will open it...
@reuschling @model-collapse I have opened the PR (#710) mentioned above.
This would be a great step into "productifying" this plugin. Is there already an idea about what criteria the benchmarking corpora should fulfill? E.g. size, domain, nature of queries (keywords,...
I think the decision to make V1 the default before it reaches feature parity was ill taken.
I have observed the same issue while load testing 0.6.0. I also have observed the error when the GPU KV cache usage was close to 100%. I'm not sure there...
I have now observed another exception: ``` Sep 06 10:53:54 hal9000 docker[931748]: ERROR 09-06 01:53:54 async_llm_engine.py:63] Engine background task failed Sep 06 10:53:54 hal9000 docker[931748]: ERROR 09-06 01:53:54 async_llm_engine.py:63] Traceback...
Also, adding `--disable-frontend-multiprocessing` does not work around this issue.
Neither does increasing `VLLM_RPC_GET_DATA_TIMEOUT_MS` or `VLLM_ENGINE_ITERATION_TIMEOUT_S`.
I really appreciate all the work being done in all attempts so far (#2488, #3237, #4656)! I've been waiting now already months for this... I'd like to make a suggestion...