trafficstars

OVMS version ： 24.2

My purpose is to use OVMS for LLM reasoning, and the sign of the end of reasoning is defined as setting the length of the generated token;

I saw in the Doc that OVMS supports this operation, but I tried many methods and it still didn't work。

The file what I changed is blow: demos\python_demos\llm_text_generation\servable_stream\model.py

Jun 28 '24 09:06 HPUedCSLearner

I tried:

case1: No error、but did not work

case2

ERROR:

/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py:521: FutureWarning: `is_torch_tpu_available` is deprecated and will be removed in 4.41.0. Please use the `is_torch_xla_available` instead.
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/diffusers/utils/outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Compiling the model to CPU ...
Exception in thread Thread-2 (generate):
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/workspace/model.py", line 226, in generate
    result = ov_model_exec.generate(**tokens, **generate_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1384, in generate
    self._validate_model_kwargs(model_kwargs.copy())
  File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1130, in _validate_model_kwargs
    raise ValueError(
ValueError: The following `model_kwargs` are not used by the model: ['ignore_eos'] (note: typos in the generate arguments will also show up in this list)
Exception in thread Thread-4 (generate):
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/workspace/model.py", line 226, in generate
    def generate():
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1384, in generate
    self._validate_model_kwargs(model_kwargs.copy())
  File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1130, in _validate_model_kwargs
    raise ValueError(
ValueError: The following `model_kwargs` are not used by the model: ['ignore_eos'] (note: typos in the generate arguments will also show up in this list)
Exception in thread Thread-5 (generate):
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/workspace/model.py", line 226, in generate
    result = ov_model_exec.generate(**tokens, **generate_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1384, in generate
    self._validate_model_kwargs(model_kwargs.copy())
  File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1130, in _validate_model_kwargs
    raise ValueError(
ValueError: The following `model_kwargs` are not used by the model: ['ignore_eos'] (note: typos in the generate arguments will also show up in this list)

case3: No error、but did not work

set StopOnTokens() always return flase

Jun 28 '24 09:06 HPUedCSLearner

Hi @HPUedCSLearner, ignore_eos as well as other text generation specific parameters are available via OpenAI API which is not supported via Python nodes. To use OpenAI API try: demo: https://github.com/openvinotoolkit/model_server/tree/main/demos/continuous_batching documentation: https://github.com/openvinotoolkit/model_server/blob/main/docs/llm/reference.md

Jul 01 '24 07:07 mzegla

Thank you very much, I'll try it in a moment

Jul 01 '24 09:07 HPUedCSLearner

Hi @HPUedCSLearner, ignore_eos as well as other text generation specific parameters are available via OpenAI API which is not supported via Python nodes. To use OpenAI API try: demo: https://github.com/openvinotoolkit/model_server/tree/main/demos/continuous_batching documentation: https://github.com/openvinotoolkit/model_server/blob/main/docs/llm/reference.md

I run the demo is ok， but how can I run a new model like qwen? There is a error that "Trying to parse mediapipe graph definition: Qwen/Qwen1.5-4B-Chat failed"

I would really appreciate it if you could tell me how to set up the "mediapipe config_list" configuration with the new llm model。

Jul 02 '24 11:07 HPUedCSLearner

Could you share graph.pbtxt? Seems like the error is in parsing graph configuration.

Jul 02 '24 11:07 mzegla

I started with Quickstart without making any changes, Here is the content of my graph.pbtxt:

input_stream: "HTTP_REQUEST_PAYLOAD:input"
output_stream: "HTTP_RESPONSE_PAYLOAD:output"

node: {
  name: "LLMExecutor"
  calculator: "HttpLLMCalculator"
  input_stream: "LOOPBACK:loopback"
  input_stream: "HTTP_REQUEST_PAYLOAD:input"
  input_side_packet: "LLM_NODE_RESOURCES:llm"
  output_stream: "LOOPBACK:loopback"
  output_stream: "HTTP_RESPONSE_PAYLOAD:output"
  input_stream_info: {
    tag_index: 'LOOPBACK:0',
    back_edge: true
  }
  node_options: {
      [type.googleapis.com / mediapipe.LLMCalculatorOptions]: {
          models_path: "./"
      }
  }
  input_stream_handler {
    input_stream_handler: "SyncSetInputStreamHandler",
    options {
      [mediapipe.SyncSetInputStreamHandlerOptions.ext] {
        sync_set {
          tag_index: "LOOPBACK:0"
        }
      }
    }
  }
}

Jul 02 '24 13:07 HPUedCSLearner

Could you share graph.pbtxt? Seems like the error is in parsing graph configuration.

This is the content of my config.json and graph.pbtxt, please help me analyze what the configuration error is, thank you very much!!!

Jul 03 '24 02:07 HPUedCSLearner

In your graph.pbtxt in line 14 you're missing quotes - it should be: tag_index: "LOOPBACK:0",

Jul 03 '24 07:07 mzegla

Thank you very much, it works。 But I have another problem, I will study it myself first。 And thank you again!!!

Jul 03 '24 08:07 HPUedCSLearner

model_server
model_server copied to clipboard

In the inference LLM scenario, how do I set 'ignore_eos=Ture' in OVMS? Does OVMS really support this setting?

I tried:

case1: No error、but did not work

case2

case3: No error、but did not work

model_server model_server copied to clipboard

In the inference LLM scenario, how do I set 'ignore_eos=Ture' in OVMS? ​​Does OVMS really support this setting?

I tried:

case1: No error、but did not work

case2

case3: No error、but did not work

model_server
model_server copied to clipboard

In the inference LLM scenario, how do I set 'ignore_eos=Ture' in OVMS? Does OVMS really support this setting?