The LLM I deployed myself, such as Qwen, is in the form of an openai interface. How should I evaluate the effectiveness of this model? Why didn't I find it in the documentation and can only use GPT?
Hi @SafeCool ! Thanks for your question.
You can absolutely evaluate your own LLMs, such as Qwen, using the OpenAI-compatible API in Opik! While the documentation often refers to "GPT" as a default example, Opik is designed to work with any model that implements the OpenAI API standard—including your self-hosted models like Qwen.
How to evaluate your own LLM:
- Add your model to Opik using the OpenAI-compatible endpoint (just like you would for GPT).
- Set your model as the evaluation target in your Opik project or experiment.
- Use the same evaluation flows as shown in the docs for GPT—Opik will treat your model exactly the same.
Hi @SafeCool ! Thanks for your question.
You can absolutely evaluate your own LLMs, such as Qwen, using the OpenAI-compatible API in Opik! While the documentation often refers to "GPT" as a default example, Opik is designed to work with any model that implements the OpenAI API standard—including your self-hosted models like Qwen.
How to evaluate your own LLM:
- Add your model to Opik using the OpenAI-compatible endpoint (just like you would for GPT).
- Set your model as the evaluation target in your Opik project or experiment.
- Use the same evaluation flows as shown in the docs for GPT—Opik will treat your model exactly the same.
ERROR:
OPIK: Failed to compute metric hallucination_metric. Score result will be marked as failed. Traceback (most recent call last): File "/root/anaconda3/lib/python3.11/site-packages/litellm/llms/openai/openai.py", line 725, in completion raise e File "/root/anaconda3/lib/python3.11/site-packages/litellm/llms/openai/openai.py", line 627, in completion openai_client: OpenAI = self._get_openai_client( # type: ignore ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/litellm/llms/openai/openai.py", line 377, in _get_openai_client _new_client = OpenAI( ^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/openai/_client.py", line 126, in init raise OpenAIError( openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/root/anaconda3/lib/python3.11/site-packages/litellm/main.py", line 1854, in completion raise e File "/root/anaconda3/lib/python3.11/site-packages/litellm/main.py", line 1827, in completion response = openai_chat_completions.completion( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/litellm/llms/openai/openai.py", line 736, in completion raise OpenAIError( litellm.llms.openai.common_utils.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/root/anaconda3/lib/python3.11/site-packages/opik/evaluation/engine/engine.py", line 58, in _evaluate_test_case result = metric.score(**score_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/opik/decorator/base_track_decorator.py", line 314, in wrapper raise func_exception File "/root/anaconda3/lib/python3.11/site-packages/opik/decorator/base_track_decorator.py", line 287, in wrapper result = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/opik/evaluation/metrics/llm_judges/hallucination/metric.py", line 91, in score model_output = self._model.generate_string( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/opik/evaluation/models/litellm/litellm_chat_model.py", line 149, in generate_string response = self.generate_provider_response( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/opik/evaluation/models/litellm/litellm_chat_model.py", line 183, in generate_provider_response response = self._engine.completion( ^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/litellm/utils.py", line 1283, in wrapper raise e File "/root/anaconda3/lib/python3.11/site-packages/litellm/utils.py", line 1161, in wrapper result = original_function(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/litellm/main.py", line 3241, in completion raise exception_type( ^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 2239, in exception_type raise e File "/root/anaconda3/lib/python3.11/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 378, in exception_type raise AuthenticationError( litellm.exceptions.AuthenticationError: litellm.AuthenticationError: AuthenticationError: OpenAIException - The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable OPIK: Failed to compute metric hallucination_metric. Score result will be marked as failed. Traceback (most recent call last): File "/root/anaconda3/lib/python3.11/site-packages/litellm/llms/openai/openai.py", line 725, in completion raise e File "/root/anaconda3/lib/python3.11/site-packages/litellm/llms/openai/openai.py", line 627, in completion openai_client: OpenAI = self._get_openai_client( # type: ignore ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/litellm/llms/openai/openai.py", line 377, in _get_openai_client _new_client = OpenAI( ^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/openai/_client.py", line 126, in init raise OpenAIError( openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/root/anaconda3/lib/python3.11/site-packages/litellm/main.py", line 1854, in completion raise e File "/root/anaconda3/lib/python3.11/site-packages/litellm/main.py", line 1827, in completion response = openai_chat_completions.completion( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/litellm/llms/openai/openai.py", line 736, in completion raise OpenAIError( litellm.llms.openai.common_utils.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/root/anaconda3/lib/python3.11/site-packages/opik/evaluation/engine/engine.py", line 58, in _evaluate_test_case result = metric.score(**score_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/opik/decorator/base_track_decorator.py", line 314, in wrapper raise func_exception File "/root/anaconda3/lib/python3.11/site-packages/opik/decorator/base_track_decorator.py", line 287, in wrapper result = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/opik/evaluation/metrics/llm_judges/hallucination/metric.py", line 91, in score model_output = self._model.generate_string( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/opik/evaluation/models/litellm/litellm_chat_model.py", line 149, in generate_string response = self.generate_provider_response( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/opik/evaluation/models/litellm/litellm_chat_model.py", line 183, in generate_provider_response response = self._engine.completion( ^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/litellm/utils.py", line 1283, in wrapper raise e File "/root/anaconda3/lib/python3.11/site-packages/litellm/utils.py", line 1161, in wrapper result = original_function(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/litellm/main.py", line 3241, in completion raise exception_type( ^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 2239, in exception_type raise e File "/root/anaconda3/lib/python3.11/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 378, in exception_type raise AuthenticationError( litellm.exceptions.AuthenticationError: litellm.AuthenticationError: AuthenticationError: OpenAIException - The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable OPIK: Failed to compute metric hallucination_metric. Score result will be marked as failed. Traceback (most recent call last): File "/root/anaconda3/lib/python3.11/site-packages/litellm/llms/openai/openai.py", line 725, in completion raise e File "/root/anaconda3/lib/python3.11/site-packages/litellm/llms/openai/openai.py", line 627, in completion openai_client: OpenAI = self._get_openai_client( # type: ignore ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/litellm/llms/openai/openai.py", line 377, in _get_openai_client _new_client = OpenAI( ^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/openai/_client.py", line 126, in init raise OpenAIError( openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/root/anaconda3/lib/python3.11/site-packages/litellm/main.py", line 1854, in completion raise e File "/root/anaconda3/lib/python3.11/site-packages/litellm/main.py", line 1827, in completion response = openai_chat_completions.completion( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/litellm/llms/openai/openai.py", line 736, in completion raise OpenAIError( litellm.llms.openai.common_utils.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/root/anaconda3/lib/python3.11/site-packages/opik/evaluation/engine/engine.py", line 58, in _evaluate_test_case result = metric.score(**score_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/opik/decorator/base_track_decorator.py", line 314, in wrapper raise func_exception File "/root/anaconda3/lib/python3.11/site-packages/opik/decorator/base_track_decorator.py", line 287, in wrapper result = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/opik/evaluation/metrics/llm_judges/hallucination/metric.py", line 91, in score model_output = self._model.generate_string( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/opik/evaluation/models/litellm/litellm_chat_model.py", line 149, in generate_string response = self.generate_provider_response( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/opik/evaluation/models/litellm/litellm_chat_model.py", line 183, in generate_provider_response response = self._engine.completion( ^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/litellm/utils.py", line 1283, in wrapper raise e File "/root/anaconda3/lib/python3.11/site-packages/litellm/utils.py", line 1161, in wrapper result = original_function(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/litellm/main.py", line 3241, in completion raise exception_type( ^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 2239, in exception_type raise e File "/root/anaconda3/lib/python3.11/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 378, in exception_type raise AuthenticationError( litellm.exceptions.AuthenticationError: litellm.AuthenticationError: AuthenticationError: OpenAIException - The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variabl
CODE:
from opik import Opik, track from opik.evaluation import evaluate from opik.evaluation.metrics import Equals, Hallucination from opik.integrations.openai import track_openai import openai import os
from openai import OpenAI
Define the task to evaluate
os.environ["OPIK_API_KEY"] = "xxx" os.environ["OPIK_WORKSPACE"] = "xxx" client = OpenAI( base_url="http://127.0.0.1:8000/v1/", api_key="{}".format(os.environ.get("API_KEY", "0")), ) openai_client = track_openai(client)
MODEL = "qwen"
@track def your_llm_application(input: str) -> str: response = openai_client.chat.completions.create( model=MODEL, messages=[{"role": "user", "content": input}] ) return response.choices[0].message.content
Define the evaluation task
def evaluation_task(x): return { "output": your_llm_application(x['input']) }
Create a simple dataset
client = Opik() dataset = client.get_or_create_dataset(name="Demo dataset")
Define the metrics
hallucination_metric = Hallucination()
evaluation = evaluate( dataset=dataset, task=evaluation_task, scoring_metrics=[hallucination_metric], experiment_config={ "model": MODEL } )
Thanks for sharing the code!
The issue might be that you're initializing the Hallucination metric without explicitly setting the LLM, so by default, it uses OpenAI GPT-4o/3.5-turbo under the hood.
Could you try manually specifying the model, like this:
judge_model = models.LiteLLMChatModel(
model_name="qwen", # or any other local model
base_url="http://127.0.0.1:8000/v1",
api_key="somekey"
)
hallucination = Hallucination(model=judge_model)
Please let us know if that helps!
Thanks for sharing the code!
The issue might be that you're initializing the
Hallucinationmetric without explicitly setting the LLM, so by default, it usesOpenAI GPT-4o/3.5-turbounder the hood.Could you try manually specifying the model, like this:
judge_model = models.LiteLLMChatModel( model_name="qwen", # or any other local model base_url="http://127.0.0.1:8000/v1", api_key="somekey" )
hallucination = Hallucination(model=judge_model) Please let us know if that helps!
code judge_model = models.LiteLLMChatModel( model="qwen", base_url="http://127.0.0.1:8000/v1", api_key="00" )
hallucination_metric = Hallucination( model=judge_model ) error
File "/root/anaconda3/lib/python3.11/site-packages/opik/evaluation/metrics/llm_judges/hallucination/metric.py", line 91, in score model_output = self._model.generate_string( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/opik/evaluation/models/litellm/litellm_chat_model.py", line 149, in generate_string response = self.generate_provider_response( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/opik/evaluation/models/litellm/litellm_chat_model.py", line 183, in generate_provider_response response = self._engine.completion( ^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: litellm.main.completion() got multiple values for keyword argument 'model'
code judge_model = models.LiteLLMChatModel( model_name="qwen", base_url="http://127.0.0.1:8000/v1", api_key="00" )
hallucination_metric = Hallucination( model=judge_model ) error
39 completion_kwargs: key-value arguments to always pass additionally into litellm.completion function.
40 """
42 super().init(model_name=model_name)
---> 44 self._check_model_name()
45 self._check_must_support_arguments(must_support_arguments)
47 self._completion_kwargs: Dict[str, Any] = (
48 self._remove_unnecessary_not_supported_params(completion_kwargs)
49 )
File ~/anaconda3/lib/python3.11/site-packages/opik/evaluation/models/litellm/litellm_chat_model.py:101, in LiteLLMChatModel._check_model_name(self) 99 _ = litellm.get_llm_provider(self.model_name) 100 except litellm.exceptions.BadRequestError: --> 101 raise ValueError(f"Unsupported model: '{self.model_name}'!")
ValueError: Unsupported model: 'qwen'!
model=judge_model
Please add this to documentation or some examples - this is nightmare!
Changes part of https://github.com/comet-ml/opik/pull/3440 - I belive this issue is now resolved, feel free to re-open this PR if you have any on-going issues and we will be more than happy to help you out.