opik icon indicating copy to clipboard operation
opik copied to clipboard

The LLM I deployed myself, such as Qwen, is in the form of an openai interface. How should I evaluate the effectiveness of this model? Why didn't I find it in the documentation and can only use GPT?

Open SafeCool opened this issue 6 months ago • 2 comments

SafeCool avatar May 29 '25 08:05 SafeCool

Hi @SafeCool ! Thanks for your question.

You can absolutely evaluate your own LLMs, such as Qwen, using the OpenAI-compatible API in Opik! While the documentation often refers to "GPT" as a default example, Opik is designed to work with any model that implements the OpenAI API standard—including your self-hosted models like Qwen.

How to evaluate your own LLM:

  • Add your model to Opik using the OpenAI-compatible endpoint (just like you would for GPT).
  • Set your model as the evaluation target in your Opik project or experiment.
  • Use the same evaluation flows as shown in the docs for GPT—Opik will treat your model exactly the same.

japdubengsub avatar May 29 '25 14:05 japdubengsub

Hi @SafeCool ! Thanks for your question.

You can absolutely evaluate your own LLMs, such as Qwen, using the OpenAI-compatible API in Opik! While the documentation often refers to "GPT" as a default example, Opik is designed to work with any model that implements the OpenAI API standard—including your self-hosted models like Qwen.

How to evaluate your own LLM:

  • Add your model to Opik using the OpenAI-compatible endpoint (just like you would for GPT).
  • Set your model as the evaluation target in your Opik project or experiment.
  • Use the same evaluation flows as shown in the docs for GPT—Opik will treat your model exactly the same.

ERROR:

OPIK: Failed to compute metric hallucination_metric. Score result will be marked as failed. Traceback (most recent call last): File "/root/anaconda3/lib/python3.11/site-packages/litellm/llms/openai/openai.py", line 725, in completion raise e File "/root/anaconda3/lib/python3.11/site-packages/litellm/llms/openai/openai.py", line 627, in completion openai_client: OpenAI = self._get_openai_client( # type: ignore ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/litellm/llms/openai/openai.py", line 377, in _get_openai_client _new_client = OpenAI( ^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/openai/_client.py", line 126, in init raise OpenAIError( openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/root/anaconda3/lib/python3.11/site-packages/litellm/main.py", line 1854, in completion raise e File "/root/anaconda3/lib/python3.11/site-packages/litellm/main.py", line 1827, in completion response = openai_chat_completions.completion( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/litellm/llms/openai/openai.py", line 736, in completion raise OpenAIError( litellm.llms.openai.common_utils.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/root/anaconda3/lib/python3.11/site-packages/opik/evaluation/engine/engine.py", line 58, in _evaluate_test_case result = metric.score(**score_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/opik/decorator/base_track_decorator.py", line 314, in wrapper raise func_exception File "/root/anaconda3/lib/python3.11/site-packages/opik/decorator/base_track_decorator.py", line 287, in wrapper result = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/opik/evaluation/metrics/llm_judges/hallucination/metric.py", line 91, in score model_output = self._model.generate_string( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/opik/evaluation/models/litellm/litellm_chat_model.py", line 149, in generate_string response = self.generate_provider_response( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/opik/evaluation/models/litellm/litellm_chat_model.py", line 183, in generate_provider_response response = self._engine.completion( ^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/litellm/utils.py", line 1283, in wrapper raise e File "/root/anaconda3/lib/python3.11/site-packages/litellm/utils.py", line 1161, in wrapper result = original_function(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/litellm/main.py", line 3241, in completion raise exception_type( ^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 2239, in exception_type raise e File "/root/anaconda3/lib/python3.11/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 378, in exception_type raise AuthenticationError( litellm.exceptions.AuthenticationError: litellm.AuthenticationError: AuthenticationError: OpenAIException - The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable OPIK: Failed to compute metric hallucination_metric. Score result will be marked as failed. Traceback (most recent call last): File "/root/anaconda3/lib/python3.11/site-packages/litellm/llms/openai/openai.py", line 725, in completion raise e File "/root/anaconda3/lib/python3.11/site-packages/litellm/llms/openai/openai.py", line 627, in completion openai_client: OpenAI = self._get_openai_client( # type: ignore ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/litellm/llms/openai/openai.py", line 377, in _get_openai_client _new_client = OpenAI( ^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/openai/_client.py", line 126, in init raise OpenAIError( openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/root/anaconda3/lib/python3.11/site-packages/litellm/main.py", line 1854, in completion raise e File "/root/anaconda3/lib/python3.11/site-packages/litellm/main.py", line 1827, in completion response = openai_chat_completions.completion( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/litellm/llms/openai/openai.py", line 736, in completion raise OpenAIError( litellm.llms.openai.common_utils.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/root/anaconda3/lib/python3.11/site-packages/opik/evaluation/engine/engine.py", line 58, in _evaluate_test_case result = metric.score(**score_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/opik/decorator/base_track_decorator.py", line 314, in wrapper raise func_exception File "/root/anaconda3/lib/python3.11/site-packages/opik/decorator/base_track_decorator.py", line 287, in wrapper result = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/opik/evaluation/metrics/llm_judges/hallucination/metric.py", line 91, in score model_output = self._model.generate_string( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/opik/evaluation/models/litellm/litellm_chat_model.py", line 149, in generate_string response = self.generate_provider_response( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/opik/evaluation/models/litellm/litellm_chat_model.py", line 183, in generate_provider_response response = self._engine.completion( ^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/litellm/utils.py", line 1283, in wrapper raise e File "/root/anaconda3/lib/python3.11/site-packages/litellm/utils.py", line 1161, in wrapper result = original_function(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/litellm/main.py", line 3241, in completion raise exception_type( ^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 2239, in exception_type raise e File "/root/anaconda3/lib/python3.11/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 378, in exception_type raise AuthenticationError( litellm.exceptions.AuthenticationError: litellm.AuthenticationError: AuthenticationError: OpenAIException - The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable OPIK: Failed to compute metric hallucination_metric. Score result will be marked as failed. Traceback (most recent call last): File "/root/anaconda3/lib/python3.11/site-packages/litellm/llms/openai/openai.py", line 725, in completion raise e File "/root/anaconda3/lib/python3.11/site-packages/litellm/llms/openai/openai.py", line 627, in completion openai_client: OpenAI = self._get_openai_client( # type: ignore ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/litellm/llms/openai/openai.py", line 377, in _get_openai_client _new_client = OpenAI( ^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/openai/_client.py", line 126, in init raise OpenAIError( openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/root/anaconda3/lib/python3.11/site-packages/litellm/main.py", line 1854, in completion raise e File "/root/anaconda3/lib/python3.11/site-packages/litellm/main.py", line 1827, in completion response = openai_chat_completions.completion( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/litellm/llms/openai/openai.py", line 736, in completion raise OpenAIError( litellm.llms.openai.common_utils.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/root/anaconda3/lib/python3.11/site-packages/opik/evaluation/engine/engine.py", line 58, in _evaluate_test_case result = metric.score(**score_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/opik/decorator/base_track_decorator.py", line 314, in wrapper raise func_exception File "/root/anaconda3/lib/python3.11/site-packages/opik/decorator/base_track_decorator.py", line 287, in wrapper result = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/opik/evaluation/metrics/llm_judges/hallucination/metric.py", line 91, in score model_output = self._model.generate_string( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/opik/evaluation/models/litellm/litellm_chat_model.py", line 149, in generate_string response = self.generate_provider_response( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/opik/evaluation/models/litellm/litellm_chat_model.py", line 183, in generate_provider_response response = self._engine.completion( ^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/litellm/utils.py", line 1283, in wrapper raise e File "/root/anaconda3/lib/python3.11/site-packages/litellm/utils.py", line 1161, in wrapper result = original_function(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/litellm/main.py", line 3241, in completion raise exception_type( ^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 2239, in exception_type raise e File "/root/anaconda3/lib/python3.11/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 378, in exception_type raise AuthenticationError( litellm.exceptions.AuthenticationError: litellm.AuthenticationError: AuthenticationError: OpenAIException - The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variabl

CODE:

from opik import Opik, track from opik.evaluation import evaluate from opik.evaluation.metrics import Equals, Hallucination from opik.integrations.openai import track_openai import openai import os

from openai import OpenAI

Define the task to evaluate

os.environ["OPIK_API_KEY"] = "xxx" os.environ["OPIK_WORKSPACE"] = "xxx" client = OpenAI( base_url="http://127.0.0.1:8000/v1/", api_key="{}".format(os.environ.get("API_KEY", "0")), ) openai_client = track_openai(client)

MODEL = "qwen"

@track def your_llm_application(input: str) -> str: response = openai_client.chat.completions.create( model=MODEL, messages=[{"role": "user", "content": input}] ) return response.choices[0].message.content

Define the evaluation task

def evaluation_task(x): return { "output": your_llm_application(x['input']) }

Create a simple dataset

client = Opik() dataset = client.get_or_create_dataset(name="Demo dataset")

Define the metrics

hallucination_metric = Hallucination()

evaluation = evaluate( dataset=dataset, task=evaluation_task, scoring_metrics=[hallucination_metric], experiment_config={ "model": MODEL } )

SafeCool avatar May 30 '25 02:05 SafeCool

Thanks for sharing the code!

The issue might be that you're initializing the Hallucination metric without explicitly setting the LLM, so by default, it uses OpenAI GPT-4o/3.5-turbo under the hood.

Could you try manually specifying the model, like this:

judge_model = models.LiteLLMChatModel(
    model_name="qwen",  # or any other local model
    base_url="http://127.0.0.1:8000/v1",
    api_key="somekey"
)

hallucination = Hallucination(model=judge_model)

Please let us know if that helps!

japdubengsub avatar May 30 '25 12:05 japdubengsub

Thanks for sharing the code!

The issue might be that you're initializing the Hallucination metric without explicitly setting the LLM, so by default, it uses OpenAI GPT-4o/3.5-turbo under the hood.

Could you try manually specifying the model, like this:

judge_model = models.LiteLLMChatModel( model_name="qwen", # or any other local model base_url="http://127.0.0.1:8000/v1", api_key="somekey" )

hallucination = Hallucination(model=judge_model) Please let us know if that helps!

code judge_model = models.LiteLLMChatModel( model="qwen", base_url="http://127.0.0.1:8000/v1", api_key="00" )

hallucination_metric = Hallucination( model=judge_model ) error

File "/root/anaconda3/lib/python3.11/site-packages/opik/evaluation/metrics/llm_judges/hallucination/metric.py", line 91, in score model_output = self._model.generate_string( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/opik/evaluation/models/litellm/litellm_chat_model.py", line 149, in generate_string response = self.generate_provider_response( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/opik/evaluation/models/litellm/litellm_chat_model.py", line 183, in generate_provider_response response = self._engine.completion( ^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: litellm.main.completion() got multiple values for keyword argument 'model'

code judge_model = models.LiteLLMChatModel( model_name="qwen", base_url="http://127.0.0.1:8000/v1", api_key="00" )

hallucination_metric = Hallucination( model=judge_model ) error

39 completion_kwargs: key-value arguments to always pass additionally into litellm.completion function. 40 """ 42 super().init(model_name=model_name) ---> 44 self._check_model_name() 45 self._check_must_support_arguments(must_support_arguments) 47 self._completion_kwargs: Dict[str, Any] = ( 48 self._remove_unnecessary_not_supported_params(completion_kwargs) 49 )

File ~/anaconda3/lib/python3.11/site-packages/opik/evaluation/models/litellm/litellm_chat_model.py:101, in LiteLLMChatModel._check_model_name(self) 99 _ = litellm.get_llm_provider(self.model_name) 100 except litellm.exceptions.BadRequestError: --> 101 raise ValueError(f"Unsupported model: '{self.model_name}'!")

ValueError: Unsupported model: 'qwen'!

SafeCool avatar Jun 03 '25 03:06 SafeCool

model=judge_model

Please add this to documentation or some examples - this is nightmare!

danielstankw avatar Jun 30 '25 11:06 danielstankw

Changes part of https://github.com/comet-ml/opik/pull/3440 - I belive this issue is now resolved, feel free to re-open this PR if you have any on-going issues and we will be more than happy to help you out.

vincentkoc avatar Nov 03 '25 17:11 vincentkoc