opik

opik copied to clipboard

Published 1 week ago •

Reame
Issues

The LLM I deployed myself, such as Qwen, is in the form of an openai interface. How should I evaluate the effectiveness of this model? Why didn't I find it in the documentation and can only use GPT?

Open SafeCool opened this issue 6 months ago • 2 comments

May 29 '25 08:05 SafeCool

Hi @SafeCool ! Thanks for your question.

You can absolutely evaluate your own LLMs, such as Qwen, using the OpenAI-compatible API in Opik! While the documentation often refers to "GPT" as a default example, Opik is designed to work with any model that implements the OpenAI API standard—including your self-hosted models like Qwen.

How to evaluate your own LLM:

Add your model to Opik using the OpenAI-compatible endpoint (just like you would for GPT).
Set your model as the evaluation target in your Opik project or experiment.
Use the same evaluation flows as shown in the docs for GPT—Opik will treat your model exactly the same.

May 29 '25 14:05 japdubengsub

Hi @SafeCool ! Thanks for your question.

You can absolutely evaluate your own LLMs, such as Qwen, using the OpenAI-compatible API in Opik! While the documentation often refers to "GPT" as a default example, Opik is designed to work with any model that implements the OpenAI API standard—including your self-hosted models like Qwen.

How to evaluate your own LLM:

Add your model to Opik using the OpenAI-compatible endpoint (just like you would for GPT).

Set your model as the evaluation target in your Opik project or experiment.

Use the same evaluation flows as shown in the docs for GPT—Opik will treat your model exactly the same.

ERROR:

OPIK: Failed to compute metric hallucination_metric. Score result will be marked as failed. Traceback (most recent call last): File "/root/anaconda3/lib/python3.11/site-packages/litellm/llms/openai/openai.py", line 725, in completion raise e File "/root/anaconda3/lib/python3.11/site-packages/litellm/llms/openai/openai.py", line 627, in completion openai_client: OpenAI = self._get_openai_client( # type: ignore ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/litellm/llms/openai/openai.py", line 377, in _get_openai_client _new_client = OpenAI( ^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/openai/_client.py", line 126, in init raise OpenAIError( openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/root/anaconda3/lib/python3.11/site-packages/litellm/main.py", line 1854, in completion raise e File "/root/anaconda3/lib/python3.11/site-packages/litellm/main.py", line 1827, in completion response = openai_chat_completions.completion( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/litellm/llms/openai/openai.py", line 736, in completion raise OpenAIError( litellm.llms.openai.common_utils.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/root/anaconda3/lib/python3.11/site-packages/opik/evaluation/engine/engine.py", line 58, in _evaluate_test_case result = metric.score(**score_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/opik/decorator/base_track_decorator.py", line 314, in wrapper raise func_exception File "/root/anaconda3/lib/python3.11/site-packages/opik/decorator/base_track_decorator.py", line 287, in wrapper result = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/opik/evaluation/metrics/llm_judges/hallucination/metric.py", line 91, in score model_output = self._model.generate_string( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/opik/evaluation/models/litellm/litellm_chat_model.py", line 149, in generate_string response = self.generate_provider_response( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/opik/evaluation/models/litellm/litellm_chat_model.py", line 183, in generate_provider_response response = self._engine.completion( ^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/litellm/utils.py", line 1283, in wrapper raise e File "/root/anaconda3/lib/python3.11/site-packages/litellm/utils.py", line 1161, in wrapper result = original_function(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/litellm/main.py", line 3241, in completion raise exception_type( ^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 2239, in exception_type raise e File "/root/anaconda3/lib/python3.11/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 378, in exception_type raise AuthenticationError( litellm.exceptions.AuthenticationError: litellm.AuthenticationError: AuthenticationError: OpenAIException - The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable OPIK: Failed to compute metric hallucination_metric. Score result will be marked as failed. Traceback (most recent call last): File "/root/anaconda3/lib/python3.11/site-packages/litellm/llms/openai/openai.py", line 725, in completion raise e File "/root/anaconda3/lib/python3.11/site-packages/litellm/llms/openai/openai.py", line 627, in completion openai_client: OpenAI = self._get_openai_client( # type: ignore ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/litellm/llms/openai/openai.py", line 377, in _get_openai_client _new_client = OpenAI( ^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/openai/_client.py", line 126, in init raise OpenAIError( openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/root/anaconda3/lib/python3.11/site-packages/litellm/main.py", line 1854, in completion raise e File "/root/anaconda3/lib/python3.11/site-packages/litellm/main.py", line 1827, in completion response = openai_chat_completions.completion( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/litellm/llms/openai/openai.py", line 736, in completion raise OpenAIError( litellm.llms.openai.common_utils.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/root/anaconda3/lib/python3.11/site-packages/opik/evaluation/engine/engine.py", line 58, in _evaluate_test_case result = metric.score(**score_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/opik/decorator/base_track_decorator.py", line 314, in wrapper raise func_exception File "/root/anaconda3/lib/python3.11/site-packages/opik/decorator/base_track_decorator.py", line 287, in wrapper result = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/opik/evaluation/metrics/llm_judges/hallucination/metric.py", line 91, in score model_output = self._model.generate_string( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/opik/evaluation/models/litellm/litellm_chat_model.py", line 149, in generate_string response = self.generate_provider_response( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/opik/evaluation/models/litellm/litellm_chat_model.py", line 183, in generate_provider_response response = self._engine.completion( ^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/litellm/utils.py", line 1283, in wrapper raise e File "/root/anaconda3/lib/python3.11/site-packages/litellm/utils.py", line 1161, in wrapper result = original_function(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/litellm/main.py", line 3241, in completion raise exception_type( ^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 2239, in exception_type raise e File "/root/anaconda3/lib/python3.11/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 378, in exception_type raise AuthenticationError( litellm.exceptions.AuthenticationError: litellm.AuthenticationError: AuthenticationError: OpenAIException - The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable OPIK: Failed to compute metric hallucination_metric. Score result will be marked as failed. Traceback (most recent call last): File "/root/anaconda3/lib/python3.11/site-packages/litellm/llms/openai/openai.py", line 725, in completion raise e File "/root/anaconda3/lib/python3.11/site-packages/litellm/llms/openai/openai.py", line 627, in completion openai_client: OpenAI = self._get_openai_client( # type: ignore ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/litellm/llms/openai/openai.py", line 377, in _get_openai_client _new_client = OpenAI( ^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/openai/_client.py", line 126, in init raise OpenAIError( openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/root/anaconda3/lib/python3.11/site-packages/litellm/main.py", line 1854, in completion raise e File "/root/anaconda3/lib/python3.11/site-packages/litellm/main.py", line 1827, in completion response = openai_chat_completions.completion( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/litellm/llms/openai/openai.py", line 736, in completion raise OpenAIError( litellm.llms.openai.common_utils.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/root/anaconda3/lib/python3.11/site-packages/opik/evaluation/engine/engine.py", line 58, in _evaluate_test_case result = metric.score(**score_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/opik/decorator/base_track_decorator.py", line 314, in wrapper raise func_exception File "/root/anaconda3/lib/python3.11/site-packages/opik/decorator/base_track_decorator.py", line 287, in wrapper result = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/opik/evaluation/metrics/llm_judges/hallucination/metric.py", line 91, in score model_output = self._model.generate_string( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/opik/evaluation/models/litellm/litellm_chat_model.py", line 149, in generate_string response = self.generate_provider_response( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/opik/evaluation/models/litellm/litellm_chat_model.py", line 183, in generate_provider_response response = self._engine.completion( ^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/litellm/utils.py", line 1283, in wrapper raise e File "/root/anaconda3/lib/python3.11/site-packages/litellm/utils.py", line 1161, in wrapper result = original_function(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/litellm/main.py", line 3241, in completion raise exception_type( ^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 2239, in exception_type raise e File "/root/anaconda3/lib/python3.11/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 378, in exception_type raise AuthenticationError( litellm.exceptions.AuthenticationError: litellm.AuthenticationError: AuthenticationError: OpenAIException - The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variabl

CODE:

from opik import Opik, track from opik.evaluation import evaluate from opik.evaluation.metrics import Equals, Hallucination from opik.integrations.openai import track_openai import openai import os

from openai import OpenAI

Define the task to evaluate

os.environ["OPIK_API_KEY"] = "xxx" os.environ["OPIK_WORKSPACE"] = "xxx" client = OpenAI( base_url="http://127.0.0.1:8000/v1/", api_key="{}".format(os.environ.get("API_KEY", "0")), ) openai_client = track_openai(client)

MODEL = "qwen"

@track def your_llm_application(input: str) -> str: response = openai_client.chat.completions.create( model=MODEL, messages=[{"role": "user", "content": input}] ) return response.choices[0].message.content

Define the evaluation task

def evaluation_task(x): return { "output": your_llm_application(x['input']) }

Create a simple dataset

client = Opik() dataset = client.get_or_create_dataset(name="Demo dataset")

Define the metrics

hallucination_metric = Hallucination()

evaluation = evaluate( dataset=dataset, task=evaluation_task, scoring_metrics=[hallucination_metric], experiment_config={ "model": MODEL } )

May 30 '25 02:05 SafeCool

Thanks for sharing the code!

The issue might be that you're initializing the Hallucination metric without explicitly setting the LLM, so by default, it uses OpenAI GPT-4o/3.5-turbo under the hood.

Could you try manually specifying the model, like this:

judge_model = models.LiteLLMChatModel(
    model_name="qwen",  # or any other local model
    base_url="http://127.0.0.1:8000/v1",
    api_key="somekey"
)

hallucination = Hallucination(model=judge_model)

Please let us know if that helps!

May 30 '25 12:05 japdubengsub

Thanks for sharing the code!

The issue might be that you're initializing the Hallucination metric without explicitly setting the LLM, so by default, it uses OpenAI GPT-4o/3.5-turbo under the hood.

Could you try manually specifying the model, like this:

judge_model = models.LiteLLMChatModel( model_name="qwen", # or any other local model base_url="http://127.0.0.1:8000/v1", api_key="somekey" )

hallucination = Hallucination(model=judge_model) Please let us know if that helps!

code judge_model = models.LiteLLMChatModel( model="qwen", base_url="http://127.0.0.1:8000/v1", api_key="00" )

hallucination_metric = Hallucination( model=judge_model ) error

File "/root/anaconda3/lib/python3.11/site-packages/opik/evaluation/metrics/llm_judges/hallucination/metric.py", line 91, in score model_output = self._model.generate_string( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/opik/evaluation/models/litellm/litellm_chat_model.py", line 149, in generate_string response = self.generate_provider_response( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/opik/evaluation/models/litellm/litellm_chat_model.py", line 183, in generate_provider_response response = self._engine.completion( ^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: litellm.main.completion() got multiple values for keyword argument 'model'

code judge_model = models.LiteLLMChatModel( model_name="qwen", base_url="http://127.0.0.1:8000/v1", api_key="00" )

hallucination_metric = Hallucination( model=judge_model ) error

39 completion_kwargs: key-value arguments to always pass additionally into litellm.completion function. 40 """ 42 super().init(model_name=model_name) ---> 44 self._check_model_name() 45 self._check_must_support_arguments(must_support_arguments) 47 self._completion_kwargs: Dict[str, Any] = ( 48 self._remove_unnecessary_not_supported_params(completion_kwargs) 49 )

File ~/anaconda3/lib/python3.11/site-packages/opik/evaluation/models/litellm/litellm_chat_model.py:101, in LiteLLMChatModel._check_model_name(self) 99 _ = litellm.get_llm_provider(self.model_name) 100 except litellm.exceptions.BadRequestError: --> 101 raise ValueError(f"Unsupported model: '{self.model_name}'!")

ValueError: Unsupported model: 'qwen'!

Jun 03 '25 03:06 SafeCool

model=judge_model

Please add this to documentation or some examples - this is nightmare!

Jun 30 '25 11:06 danielstankw

Changes part of https://github.com/comet-ml/opik/pull/3440 - I belive this issue is now resolved, feel free to re-open this PR if you have any on-going issues and we will be more than happy to help you out.

Nov 03 '25 17:11 vincentkoc