lmdeploy [Feature] Any way to get the logits instead of logprobs in lmdeploy?

Motivation

I now want to use lmdeploy to deploy internlm2-7b-reward. I simply expanded the weight in v_head from [1, D] to [V, D]. If we can directly obtain the logits instead of logprobs , then we can deploy the reward model using lmdeploy easily.

Related resources

No response

Additional context

No response

Sep 24 '24 05:09 hmzo

@irexyc

Sep 24 '24 06:09 lvhan028

Sorry for late reply.

Currently, you can use pipe.get_logits to obtain the logits with seq_len x vocab_size shape. It's not thread safe, I'm not sure if this meets your needs.

Sep 29 '24 03:09 irexyc

Hi, guys, how to get logprobs from service? I deployed internvl2 76B with lmdeploy:

lmdeploy serve api_server \
 --server-port $PORT \
 --tp $NUM_SHARD \
 --log-level $LOG_LEVEL \
 --cache-max-entry-count $CACHE_FRACTION \
 --session-len $MAX_TOKENS \
 --backend turbomind \
 --vision-max-batch-size $V_MAX_BATCH_SIZE \
 $MODEL_DIR

and used openai sdk to chat:

        response = self.client.chat.completions.create(
            model=self.model_name,
            messages=messages,
            temperature=0.0,
            max_tokens=self.max_tokens,
            logprobs=True,
            top_logprobs=2,
        )

but I only got logprob for the best token, like:

logprob=0.0, top_logprobs=[]

so, how to get the logprobs like openai's API? Thanks.

Oct 12 '24 06:10 heibaidaolx123

@heibaidaolx123

Do you mean top_logprobs should always contains the chosen token or the top_logprobs shoud be length of 2 or 3?

The format may not be completely consistent with vllm, but the logic of obtaining logprobs is similar to vllm. It use the probability after sampling. If you set temperature=0, it actually has only one candidate to sample, so the logprob should be 0. (I think select top2 from [1.0, -inf, -inf, ...] is meaningless)

For vllm, you may get something like this below, the probability of second item has been set to -inf by _apply_top_k_top_p actually.

0.0 [TopLogprob(token='I', bytes=[73], logprob=0.0), TopLogprob(token='The', bytes=[84, 104, 101], logprob=-149.8536376953125)]

For openai, if I set temperature to 0, I can't get logprob and I have no idea how they compute it.

Oct 12 '24 09:10 irexyc

@irexyc Thanks for the explanation. Havn't checked the code, but I tried vllm with params as follow:

    response = self.client.chat.completions.create(
            model=self.model_name,
            messages=messages,
            temperature=0.0,
            max_tokens=self.max_tokens,
            logprobs=True,
            top_logprobs=2,
        )

it did return top2 logprob like this:

top_logprobs=[TopLogprob(token='1', bytes=[xxx], logprob=-0.0025603154208511114), TopLogprob(token='2', bytes=[xxx], logprob=-6.705685138702393)]),

Oct 13 '24 02:10 heibaidaolx123

@heibaidaolx123

If you set temperature=0.0 in vllm, you will actually get temperature=1.0 instead. And vllm has a minimum limit of 0.01 on temperature. You can try temperature=0.01 in vllm.

Oct 14 '24 02:10 irexyc

Sorry for late reply.

Currently, you can use pipe.get_logits to obtain the logits with seq_len x vocab_size shape. It's not thread safe, I'm not sure if this meets your needs.

I am currently working with the pipe.get_logits function and have observed that it returns the logits for the input sequence. I am interested in obtaining the logits for the output sequence generated by the model. Specifically, I am using the following code:

from lmdeploy import pipeline, TurbomindEngineConfig
from lmdeploy.vl import load_image

model = 'OpenGVLab/InternVL2-1B'
image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))
response = pipe(('describe this image', image))
print(response.text)

Could you please advise on how to modify this code to retrieve the logits corresponding to the generated output sequence? I would greatly appreciate your guidance on this matter.

Nov 27 '24 08:11 jiaqihuang01

Sorry for late reply. Currently, you can use pipe.get_logits to obtain the logits with seq_len x vocab_size shape. It's not thread safe, I'm not sure if this meets your needs.

I am currently working with the pipe.get_logits function and have observed that it returns the logits for the input sequence. I am interested in obtaining the logits for the output sequence generated by the model. Specifically, I am using the following code:
from lmdeploy import pipeline, TurbomindEngineConfig
from lmdeploy.vl import load_image

model = 'OpenGVLab/InternVL2-1B'
image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))
response = pipe(('describe this image', image))
print(response.text)
Could you please advise on how to modify this code to retrieve the logits corresponding to the generated output sequence? I would greatly appreciate your guidance on this matter.

We are encountering the same issue. Is there a way for the model to return the actual logits of the output sequences? Any assistance would be appreciated!

Jan 10 '25 03:01 jakeju92

It will be provided in v0.7.0 PR #3008 is dealing with it.

Jan 12 '25 09:01 lvhan028

This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response.

Jan 20 '25 02:01 github-actions[bot]

May try the latest v0.7.0 for logits Here is the guide https://lmdeploy.readthedocs.io/en/latest/multi_modal/vl_pipeline.html#output-logits-for-generated-tokens

Jan 20 '25 03:01 lvhan028

This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response.

Jan 29 '25 02:01 github-actions[bot]

This issue is closed because it has been stale for 5 days. Please open a new issue if you have similar issues or you have any new updates now.

Feb 03 '25 02:02 github-actions[bot]