gritlm How would one go about running embedding as a service using something like vLLM?

How would one go about running embedding as a service using something like vLLM?

Open sungkim11 opened this issue 1 year ago • 5 comments

trafficstars

I would like to run embedding as a service using something like vLLM on a Docker container on different host. How would one go about doing this?

Feb 18 '24 05:02 sungkim11

I think it should be easy to serve GritLM using vLLM or similar and providing access to its embedding capability / its language modeling capability or both in one single model / endpoint. But I'm not sure about the details of vllm etc.

Feb 18 '24 16:02 Muennighoff

would we just need to get the last hidden states for the embed token and return it from vllm at inference time?

Jul 12 '24 20:07 creatorrr

the last hidden state for the entire seq to be embedded & then mean pool it

Jul 12 '24 20:07 Muennighoff

vllm seems to support encode method(which we need for embedding model) after vllm 0.4.3. But I am running into some issues. When I run gritlm on unified mode, vllm doesnt seem to consider gritlm an embedding model and doesnt allow it to call encode() func(according to this issue https://github.com/vllm-project/vllm/issues/6015), and I get the error File "/miniconda/envs/dg_venv/lib/python3.10/site-packages/vllm/model_executor/sampling_metadata.py", line 116, in prepare ) = _prepare_seq_groups(seq_group_metadata_list, seq_lens, query_lens, File "/miniconda/envs/dg_venv/lib/python3.10/site-packages/vllm/model_executor/sampling_metadata.py", line 208, in _prepare_seq_groups if sampling_params.seed is not None: AttributeError: 'NoneType' object has no attribute 'seed'

anyone has the same problem / is working on embedding model with vllm?

Jul 16 '24 09:07 GeraldWu23

I would like to run embedding as a service using something like vLLM on a Docker container on different host. How would one go about doing this?

have you solved the problem

Aug 27 '24 04:08 zhuxiaohai

gritlm gritlm copied to clipboard

How would one go about running embedding as a service using something like vLLM?

gritlm
gritlm copied to clipboard