aresnow1 comments

Results 34 comments of


                                            aresnow1

ENH: Support concurrent embedding, update LangChain QA demo with multithreaded embedding creation

Embedding is a CPU-intensive call, and even for a stateless actor, it is not executed simultaneously because the current loop lock is not released until the first call. Therefore, the...

QUESTION: worker cannot connected to supervisor with PodIP

Thanks for your feedback, it appears that the port of the supervisor is not exposed. We will fix this issue in the next version.

QUESTION: How to load Yi-200k 34b with 4 A10 cards

> 另外，再请问下，如果GPU编号不是连续的，比如有5个GPU，0-4,3被其他模型占用了，0，1，2和4共3个，用n_gpu=4也可以吗？ n_gpu=4 最好保证有四个闲置的 GPU

QUESTION: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects

Do you have GPU cards on your machine?

FEAT: OpenAI drop in replacement for /v1/completions (i.e. chat completions).

We have created a pull request: https://github.com/langchain-ai/langchain/pull/12702, waiting for it to be merged!

ENH: Failed to launch customized chatglm model

Additionally, model ability must select "chat".

FEAT: Support TensorRT-LLM backend

Python API of in-flight batching is needed for this PR, and TensorRT-LLM team says it will be implemented in next versions.

QUESTION: change the save path for models

Refer to the documentation: https://inference.readthedocs.io/en/latest/getting_started/using_xinference.html#configure-xinference-home-path

QUESTION: change the save path for models

> I have similar question too. I downloa models from huggingface via aria2, cause it can download with multi thread support. After downloaded, don't know how to put the models...

BUG: `xorbits.shutdown()` didn't destroy the cluster

Currently, there is no interface that can directly shut down the cluster. An interface for stopping can be added to the cluster API.