[Feature] Any way to get the logits instead of logprobs in lmdeploy?
Motivation
I now want to use lmdeploy to deploy internlm2-7b-reward. I simply expanded the weight in v_head from [1, D] to [V, D]. If we can directly obtain the logits instead of logprobs , then we can deploy the reward model using lmdeploy easily.
Related resources
No response
Additional context
No response
@irexyc
Sorry for late reply.
Currently, you can use pipe.get_logits to obtain the logits with seq_len x vocab_size shape. It's not thread safe, I'm not sure if this meets your needs.
Hi, guys, how to get logprobs from service? I deployed internvl2 76B with lmdeploy:
lmdeploy serve api_server \
--server-port $PORT \
--tp $NUM_SHARD \
--log-level $LOG_LEVEL \
--cache-max-entry-count $CACHE_FRACTION \
--session-len $MAX_TOKENS \
--backend turbomind \
--vision-max-batch-size $V_MAX_BATCH_SIZE \
$MODEL_DIR
and used openai sdk to chat:
response = self.client.chat.completions.create(
model=self.model_name,
messages=messages,
temperature=0.0,
max_tokens=self.max_tokens,
logprobs=True,
top_logprobs=2,
)
but I only got logprob for the best token, like:
logprob=0.0, top_logprobs=[]
so, how to get the logprobs like openai's API? Thanks.
@heibaidaolx123
Do you mean top_logprobs should always contains the chosen token or the top_logprobs shoud be length of 2 or 3?
The format may not be completely consistent with vllm, but the logic of obtaining logprobs is similar to vllm. It use the probability after sampling. If you set temperature=0, it actually has only one candidate to sample, so the logprob should be 0. (I think select top2 from [1.0, -inf, -inf, ...] is meaningless)
For vllm, you may get something like this below, the probability of second item has been set to -inf by _apply_top_k_top_p actually.
0.0 [TopLogprob(token='I', bytes=[73], logprob=0.0), TopLogprob(token='The', bytes=[84, 104, 101], logprob=-149.8536376953125)]
For openai, if I set temperature to 0, I can't get logprob and I have no idea how they compute it.
@irexyc Thanks for the explanation. Havn't checked the code, but I tried vllm with params as follow:
response = self.client.chat.completions.create(
model=self.model_name,
messages=messages,
temperature=0.0,
max_tokens=self.max_tokens,
logprobs=True,
top_logprobs=2,
)
it did return top2 logprob like this:
top_logprobs=[TopLogprob(token='1', bytes=[xxx], logprob=-0.0025603154208511114), TopLogprob(token='2', bytes=[xxx], logprob=-6.705685138702393)]),
@heibaidaolx123
If you set temperature=0.0 in vllm, you will actually get temperature=1.0 instead. And vllm has a minimum limit of 0.01 on temperature. You can try temperature=0.01 in vllm.
Sorry for late reply.
Currently, you can use
pipe.get_logitsto obtain the logits with seq_len x vocab_size shape. It's not thread safe, I'm not sure if this meets your needs.
I am currently working with the pipe.get_logits function and have observed that it returns the logits for the input sequence. I am interested in obtaining the logits for the output sequence generated by the model. Specifically, I am using the following code:
from lmdeploy import pipeline, TurbomindEngineConfig
from lmdeploy.vl import load_image
model = 'OpenGVLab/InternVL2-1B'
image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))
response = pipe(('describe this image', image))
print(response.text)
Could you please advise on how to modify this code to retrieve the logits corresponding to the generated output sequence? I would greatly appreciate your guidance on this matter.
Sorry for late reply. Currently, you can use
pipe.get_logitsto obtain the logits with seq_len x vocab_size shape. It's not thread safe, I'm not sure if this meets your needs.I am currently working with the
pipe.get_logitsfunction and have observed that it returns the logits for the input sequence. I am interested in obtaining the logits for the output sequence generated by the model. Specifically, I am using the following code:from lmdeploy import pipeline, TurbomindEngineConfig from lmdeploy.vl import load_image model = 'OpenGVLab/InternVL2-1B' image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg') pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192)) response = pipe(('describe this image', image)) print(response.text)Could you please advise on how to modify this code to retrieve the logits corresponding to the generated output sequence? I would greatly appreciate your guidance on this matter.
We are encountering the same issue. Is there a way for the model to return the actual logits of the output sequences? Any assistance would be appreciated!
It will be provided in v0.7.0 PR #3008 is dealing with it.
This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response.
May try the latest v0.7.0 for logits Here is the guide https://lmdeploy.readthedocs.io/en/latest/multi_modal/vl_pipeline.html#output-logits-for-generated-tokens
This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response.
This issue is closed because it has been stale for 5 days. Please open a new issue if you have similar issues or you have any new updates now.