model_server icon indicating copy to clipboard operation
model_server copied to clipboard

In the inference LLM scenario, how do I set 'ignore_eos=Ture' in OVMS? ​​Does OVMS really support this setting?

Open HPUedCSLearner opened this issue 1 year ago • 3 comments
trafficstars

OVMS version : 24.2

My purpose is to use OVMS for LLM reasoning, and the sign of the end of reasoning is defined as setting the length of the generated token;

I saw in the Doc that OVMS supports this operation, but I tried many methods and it still didn't work。

image

The file what I changed is blow: demos\python_demos\llm_text_generation\servable_stream\model.py image

HPUedCSLearner avatar Jun 28 '24 09:06 HPUedCSLearner

I tried:

case1: No error、but did not work

image

case2

image

ERROR:

/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py:521: FutureWarning: `is_torch_tpu_available` is deprecated and will be removed in 4.41.0. Please use the `is_torch_xla_available` instead.
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/diffusers/utils/outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Compiling the model to CPU ...
Exception in thread Thread-2 (generate):
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/workspace/model.py", line 226, in generate
    result = ov_model_exec.generate(**tokens, **generate_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1384, in generate
    self._validate_model_kwargs(model_kwargs.copy())
  File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1130, in _validate_model_kwargs
    raise ValueError(
ValueError: The following `model_kwargs` are not used by the model: ['ignore_eos'] (note: typos in the generate arguments will also show up in this list)
Exception in thread Thread-4 (generate):
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/workspace/model.py", line 226, in generate
    def generate():
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1384, in generate
    self._validate_model_kwargs(model_kwargs.copy())
  File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1130, in _validate_model_kwargs
    raise ValueError(
ValueError: The following `model_kwargs` are not used by the model: ['ignore_eos'] (note: typos in the generate arguments will also show up in this list)
Exception in thread Thread-5 (generate):
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/workspace/model.py", line 226, in generate
    result = ov_model_exec.generate(**tokens, **generate_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1384, in generate
    self._validate_model_kwargs(model_kwargs.copy())
  File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1130, in _validate_model_kwargs
    raise ValueError(
ValueError: The following `model_kwargs` are not used by the model: ['ignore_eos'] (note: typos in the generate arguments will also show up in this list)

case3: No error、but did not work

set StopOnTokens() always return flase image

HPUedCSLearner avatar Jun 28 '24 09:06 HPUedCSLearner

Hi @HPUedCSLearner, ignore_eos as well as other text generation specific parameters are available via OpenAI API which is not supported via Python nodes. To use OpenAI API try: demo: https://github.com/openvinotoolkit/model_server/tree/main/demos/continuous_batching documentation: https://github.com/openvinotoolkit/model_server/blob/main/docs/llm/reference.md

mzegla avatar Jul 01 '24 07:07 mzegla

Thank you very much, I'll try it in a moment

HPUedCSLearner avatar Jul 01 '24 09:07 HPUedCSLearner

Hi @HPUedCSLearner, ignore_eos as well as other text generation specific parameters are available via OpenAI API which is not supported via Python nodes. To use OpenAI API try: demo: https://github.com/openvinotoolkit/model_server/tree/main/demos/continuous_batching documentation: https://github.com/openvinotoolkit/model_server/blob/main/docs/llm/reference.md

I run the demo is ok, but how can I run a new model like qwen? There is a error that "Trying to parse mediapipe graph definition: Qwen/Qwen1.5-4B-Chat failed"

I would really appreciate it if you could tell me how to set up the "mediapipe config_list" configuration with the new llm model。

image

HPUedCSLearner avatar Jul 02 '24 11:07 HPUedCSLearner

Could you share graph.pbtxt? Seems like the error is in parsing graph configuration.

mzegla avatar Jul 02 '24 11:07 mzegla

I started with Quickstart without making any changes, Here is the content of my graph.pbtxt:

input_stream: "HTTP_REQUEST_PAYLOAD:input"
output_stream: "HTTP_RESPONSE_PAYLOAD:output"

node: {
  name: "LLMExecutor"
  calculator: "HttpLLMCalculator"
  input_stream: "LOOPBACK:loopback"
  input_stream: "HTTP_REQUEST_PAYLOAD:input"
  input_side_packet: "LLM_NODE_RESOURCES:llm"
  output_stream: "LOOPBACK:loopback"
  output_stream: "HTTP_RESPONSE_PAYLOAD:output"
  input_stream_info: {
    tag_index: 'LOOPBACK:0',
    back_edge: true
  }
  node_options: {
      [type.googleapis.com / mediapipe.LLMCalculatorOptions]: {
          models_path: "./"
      }
  }
  input_stream_handler {
    input_stream_handler: "SyncSetInputStreamHandler",
    options {
      [mediapipe.SyncSetInputStreamHandlerOptions.ext] {
        sync_set {
          tag_index: "LOOPBACK:0"
        }
      }
    }
  }
}

HPUedCSLearner avatar Jul 02 '24 13:07 HPUedCSLearner

Could you share graph.pbtxt? Seems like the error is in parsing graph configuration.

This is the content of my config.json and graph.pbtxt, please help me analyze what the configuration error is, thank you very much!!! image

HPUedCSLearner avatar Jul 03 '24 02:07 HPUedCSLearner

In your graph.pbtxt in line 14 you're missing quotes - it should be: tag_index: "LOOPBACK:0",

mzegla avatar Jul 03 '24 07:07 mzegla

Thank you very much, it works。 But I have another problem, I will study it myself first。 And thank you again!!!

HPUedCSLearner avatar Jul 03 '24 08:07 HPUedCSLearner