model_server
model_server copied to clipboard
In the inference LLM scenario, how do I set 'ignore_eos=Ture' in OVMS? Does OVMS really support this setting?
OVMS version : 24.2
My purpose is to use OVMS for LLM reasoning, and the sign of the end of reasoning is defined as setting the length of the generated token;
I saw in the Doc that OVMS supports this operation, but I tried many methods and it still didn't work。
The file what I changed is blow:
demos\python_demos\llm_text_generation\servable_stream\model.py
I tried:
case1: No error、but did not work
case2
ERROR:
/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py:521: FutureWarning: `is_torch_tpu_available` is deprecated and will be removed in 4.41.0. Please use the `is_torch_xla_available` instead.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/diffusers/utils/outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Compiling the model to CPU ...
Exception in thread Thread-2 (generate):
Traceback (most recent call last):
File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/usr/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/workspace/model.py", line 226, in generate
result = ov_model_exec.generate(**tokens, **generate_kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1384, in generate
self._validate_model_kwargs(model_kwargs.copy())
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1130, in _validate_model_kwargs
raise ValueError(
ValueError: The following `model_kwargs` are not used by the model: ['ignore_eos'] (note: typos in the generate arguments will also show up in this list)
Exception in thread Thread-4 (generate):
Traceback (most recent call last):
File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/usr/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/workspace/model.py", line 226, in generate
def generate():
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1384, in generate
self._validate_model_kwargs(model_kwargs.copy())
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1130, in _validate_model_kwargs
raise ValueError(
ValueError: The following `model_kwargs` are not used by the model: ['ignore_eos'] (note: typos in the generate arguments will also show up in this list)
Exception in thread Thread-5 (generate):
Traceback (most recent call last):
File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/usr/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/workspace/model.py", line 226, in generate
result = ov_model_exec.generate(**tokens, **generate_kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1384, in generate
self._validate_model_kwargs(model_kwargs.copy())
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1130, in _validate_model_kwargs
raise ValueError(
ValueError: The following `model_kwargs` are not used by the model: ['ignore_eos'] (note: typos in the generate arguments will also show up in this list)
case3: No error、but did not work
set StopOnTokens() always return flase
Hi @HPUedCSLearner, ignore_eos as well as other text generation specific parameters are available via OpenAI API which is not supported via Python nodes. To use OpenAI API try:
demo: https://github.com/openvinotoolkit/model_server/tree/main/demos/continuous_batching
documentation: https://github.com/openvinotoolkit/model_server/blob/main/docs/llm/reference.md
Thank you very much, I'll try it in a moment
Hi @HPUedCSLearner,
ignore_eosas well as other text generation specific parameters are available via OpenAI API which is not supported via Python nodes. To use OpenAI API try: demo: https://github.com/openvinotoolkit/model_server/tree/main/demos/continuous_batching documentation: https://github.com/openvinotoolkit/model_server/blob/main/docs/llm/reference.md
I run the demo is ok, but how can I run a new model like qwen? There is a error that "Trying to parse mediapipe graph definition: Qwen/Qwen1.5-4B-Chat failed"
I would really appreciate it if you could tell me how to set up the "mediapipe config_list" configuration with the new llm model。
Could you share graph.pbtxt? Seems like the error is in parsing graph configuration.
I started with Quickstart without making any changes, Here is the content of my graph.pbtxt:
input_stream: "HTTP_REQUEST_PAYLOAD:input"
output_stream: "HTTP_RESPONSE_PAYLOAD:output"
node: {
name: "LLMExecutor"
calculator: "HttpLLMCalculator"
input_stream: "LOOPBACK:loopback"
input_stream: "HTTP_REQUEST_PAYLOAD:input"
input_side_packet: "LLM_NODE_RESOURCES:llm"
output_stream: "LOOPBACK:loopback"
output_stream: "HTTP_RESPONSE_PAYLOAD:output"
input_stream_info: {
tag_index: 'LOOPBACK:0',
back_edge: true
}
node_options: {
[type.googleapis.com / mediapipe.LLMCalculatorOptions]: {
models_path: "./"
}
}
input_stream_handler {
input_stream_handler: "SyncSetInputStreamHandler",
options {
[mediapipe.SyncSetInputStreamHandlerOptions.ext] {
sync_set {
tag_index: "LOOPBACK:0"
}
}
}
}
}
Could you share
graph.pbtxt? Seems like the error is in parsing graph configuration.
This is the content of my config.json and graph.pbtxt, please help me analyze what the configuration error is, thank you very much!!!
In your graph.pbtxt in line 14 you're missing quotes - it should be:
tag_index: "LOOPBACK:0",
Thank you very much, it works。 But I have another problem, I will study it myself first。 And thank you again!!!