dspy icon indicating copy to clipboard operation
dspy copied to clipboard

ValueError: Expected dict_keys(['answer']) but got dict_keys([])

Open rangehow opened this issue 1 year ago • 35 comments

I run the official code example in intro.ipynb

import dspy

lm = dspy.LM(model='openai/default', api_key=" ", api_base=" ",temperature=0.9, max_tokens=3000,)
colbertv2_wiki17_abstracts = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')

dspy.settings.configure(lm=lm, rm=colbertv2_wiki17_abstracts)

from dspy.datasets import HotPotQA

# Load the dataset.
dataset = HotPotQA(train_seed=1, train_size=20, eval_seed=2023, dev_size=50, test_size=0)

# Tell DSPy that the 'question' field is the input. Any other fields are labels and/or metadata.
trainset = [x.with_inputs('question') for x in dataset.train]
devset = [x.with_inputs('question') for x in dataset.dev]

len(trainset), len(devset)

train_example = trainset[0]
print(f"Question: {train_example.question}")
print(f"Answer: {train_example.answer}")

dev_example = devset[18]
print(f"Question: {dev_example.question}")
print(f"Answer: {dev_example.answer}")
print(f"Relevant Wikipedia Titles: {dev_example.gold_titles}")


class BasicQA(dspy.Signature):
    """Answer questions with short factoid answers."""

    question = dspy.InputField()
    answer = dspy.OutputField(desc="often between 1 and 5 words")
    
# Define the predictor.
generate_answer = dspy.Predict(BasicQA)

# Call the predictor on a particular input.
pred = generate_answer(question=dev_example.question)

# Print the input and the prediction.
print(f"Question: {dev_example.question}")
print(f"Predicted Answer: {pred.answer}")

ERROR I get

Traceback (most recent call last):
  File "/mnt/rangehow/dspy/1.py", line 39, in <module>
    pred = generate_answer(question=dev_example.question)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/rangehow/dspy/dspy/predict/predict.py", line 98, in __call__
    return self.forward(**kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/rangehow/dspy/dspy/predict/predict.py", line 131, in forward
    completions = v2_5_generate(lm, config, signature, demos, kwargs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/rangehow/dspy/dspy/predict/predict.py", line 231, in v2_5_generate
    return adapter(lm, lm_kwargs=lm_kwargs, signature=signature, demos=demos, inputs=inputs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/rangehow/dspy/dspy/adapters/base.py", line 11, in __call__
    value = self.parse(signature, output)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/rangehow/dspy/dspy/adapters/chat_adapter.py", line 58, in parse
    raise ValueError(f"Expected {signature.output_fields.keys()} but got {fields.keys()}")
ValueError: Expected dict_keys(['answer']) but got dict_keys([])

The outputs from LLM, I print it out:

['answer: American\n[[ ## completed ## ]]']

rangehow avatar Sep 25 '24 12:09 rangehow

https://github.com/stanfordnlp/dspy/blob/cc96b7a61375ebddc9896245e10b96cb49e2e27c/dspy/adapters/chat_adapter.py#L47 [(None, 'answer: American'), ('completed', '')]

rangehow avatar Sep 25 '24 12:09 rangehow

I attach to the same issue, since the message is the same, but the scenario is different. This massage occurs when running COPRO optimizer. The problem is that it the model might exceed the prompt output limit, and the instruction is too long and does not include the expected sections.

I would recommend to include a logic in the optimizer that will ignore the output/generations that don't include the necessary fields rather than fail the whole optimization. This should be a very basic improvement.

apohllo avatar Sep 25 '24 13:09 apohllo

Hey @rangehow ! Thanks for opening this.

What is openai/default? What LM is this connecting to? It seems like that LM fails to respect the output format, but I can look more closely if I know more.

okhat avatar Sep 25 '24 15:09 okhat

Hey @rangehow ! Thanks for opening this.嘿 !感谢您打开这个。

What is openai/default? What LM is this connecting to? It seems like that LM fails to respect the output format, but I can look more closely if I know more.什么是openai/default ?这是连接到哪个LM?看起来 LM 不尊重输出格式,但如果我了解更多,我可以更仔细地观察。

Thanks for your reply, I use Qwen2.5-7B-instruct hosted by sglang

rangehow avatar Sep 26 '24 06:09 rangehow

In my case it's speakleash/Bielik-11B-v2.2-Instruct-GPTQ and the model is not producing correct output - the markers are missing in the output of the model.

apohllo avatar Sep 26 '24 13:09 apohllo

Ah, thank you both! OK it seems like these models don't stick to the format described in the instructions too well.

One thing that may help is using a larger LM (like gpt-4o-mini) to teach the small LM by showing it examples. This can be super cheap because the larger LM is only invoked very few times.

Learn more about this: https://github.com/stanfordnlp/dspy/blob/main/examples/nli/scone/scone.ipynb

That said, I will look into this class of models and try to see how we can ensure they're supported out of the box.

okhat avatar Sep 26 '24 14:09 okhat

I think a nice addition that could work is auto-fixing the output by the model itself. I.e. if the field is wrongly structured ask the model to fix it. This should be straight forward to implement.

apohllo avatar Sep 28 '24 01:09 apohllo

@rangehow did you notice this with v2.4.16 or v2.4.17? I've upgraded to 2.5.2 and 2.5.3 and am noticing the same issue with qwen-7b with ollama.

NumberChiffre avatar Oct 03 '24 18:10 NumberChiffre

Tagging @isaacbmiller since he's exploring this sort of thing nowadays.

okhat avatar Oct 04 '24 15:10 okhat

@okhat should predictors include retries by default?

I think the formatting change to the system prompt we were talking about will help with outputting [ ## answer ## ] instead of answer:

isaacbmiller avatar Oct 04 '24 16:10 isaacbmiller

Hello, I am using Phi-3-mini-128k-instruct and ran into the say problem as previously described. :-/ If there is anything I could do to help test possible solutions please let me know. I'm using the 'new' adapter (dspy.LM) through an OpenAI-compatible server (i.e. text-generation-inference). Thank you.

I tested with OpenAI gpt-4-mini and it worked fine.

cielonet avatar Oct 05 '24 01:10 cielonet

@cielonet if you could try pointing to the mm-adapter branch and letting me know if the problem still occurs that would be extremely helpful.

If not I can try it myself sometime tomorrow.

isaacbmiller avatar Oct 05 '24 01:10 isaacbmiller

@cielonet if you could try pointing to the mm-adapter branch and letting me know if the problem still occurs that would be extremely helpful.

If not I can try it myself sometime tomorrow.

Will do this now! I should get back to you shortly! Thank you. :-) :+1:

cielonet avatar Oct 05 '24 01:10 cielonet

Hey, @isaacbmiller, I tried your changes and I'm still getting same issue, however I noticed this code change does show me by default the llm response. I have attached an output. Generally I would do this in notebook, but I am testing DSPy with airflow, so the outputs below show the results of an execution of a task. I have attached the provided code as well for ref.

01:47:25 - LiteLLM:INFO: utils.py:2983 - 
LiteLLM completion() model= tgi; provider = openai
[2024-10-05T01:47:25.797+0000] {utils.py:2983} INFO - 
LiteLLM completion() model= tgi; provider = openai
[2024-10-05T01:47:26.830+0000] {_client.py:1026} INFO - HTTP Request: POST http://192.168.1.152:8085/v1/chat/completions "HTTP/1.1 200 OK"
01:47:26 - LiteLLM:INFO: utils.py:999 - Wrapper: Completed Call, calling success_handler
[2024-10-05T01:47:26.839+0000] {utils.py:999} INFO - Wrapper: Completed Call, calling success_handler
Expected dict_keys(['reasoning', 'output']) but got dict_keys([]) from [[reasoning]] 9.9 is larger than 9.11 because when comparing decimal numbers, the number with the greater value in the tenths place and beyond is the larger number. Here, both numbers have "9" in the ones place, but in the tenths place, 9.9 has a "9" compared to 9.11 which has a "1". Additionally, 9.9 has a "9" in the hundredths place, whereas 9.11 has a "1" in the hundredths place, further confirming that 9.9 is larger.

[[output]] 9.9

[[completed]]
2024-10-05 01:47:26,844] {taskinstance.py:3310} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.12/site-packages/airflow/models/taskinstance.py", line 767, in _execute_task
    result = _execute_callable(context=context, **execute_callable_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/airflow/models/taskinstance.py", line 733, in _execute_callable
    return ExecutionCallableRunner(
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/airflow/utils/operator_helpers.py", line 252, in run
    return self.func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/airflow/models/baseoperator.py", line 406, in wrapper
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/airflow/decorators/base.py", line 266, in execute
    return_value = super().execute(context)
                   ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/airflow/models/baseoperator.py", line 406, in wrapper
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/airflow/operators/python.py", line 238, in execute
    return_value = self.execute_callable()
                   ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/airflow/operators/python.py", line 256, in execute_callable
    return runner.run(*self.op_args, **self.op_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/airflow/utils/operator_helpers.py", line 252, in run
    return self.func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/airflow/dags/includes/lib/extract/calls.py", line 356, in use_cot
    response = qa(input=question)
               ^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/dspy/primitives/program.py", line 26, in __call__
    return self.forward(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/dspy/predict/chain_of_thought.py", line 44, in forward
    return self._predict(signature=signature, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/dspy/predict/predict.py", line 106, in __call__
    return self.forward(**kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/dspy/predict/predict.py", line 139, in forward
    completions = v2_5_generate(lm, config, signature, demos, kwargs, _parse_values=self._parse_values)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/dspy/predict/predict.py", line 245, in v2_5_generate
    return adapter(lm, lm_kwargs=lm_kwargs, signature=signature, demos=demos, inputs=inputs, _parse_values=_parse_values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/dspy/adapters/base.py", line 30, in __call__
    value = self.parse(signature, output, _parse_values=_parse_values)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/dspy/adapters/chat_adapter.py", line 66, in parse
    raise ValueError(f"Expected {signature.output_fields.keys()} but got {fields.keys()}")
ValueError: Expected dict_keys(['reasoning', 'output']) but got dict_keys([])

import litellm
import dspy
litellm.caching = False
litellm.telemetry = False


llm = dspy.LM(model=MODEL, api_base=BASE_URL, api_key=API_KEY, temperature=1.0, cache=False, model_type='chat')
dspy.configure(lm=llm)
dspy.settings.experimental = True

@task
def use_cot(query: list[dict[str,str]]) -> str:
    
    # Extract 'content' from each dictionary in the list
    question = '\n'.join(item['content'] for item in query) # query = [{'role': 'user', 'content': 'What number is larger 9.11 or 9.9?'}] -> 'What number is larger 9.11 or 9.9?\n'

    qa = dspy.ChainOfThought('input -> reasoning, output')
    response = qa(input=question)
    print(f"RESPONSE: {response}")
    
    return "Done" # temp holder until we can fix issue!

Installation Procedure

pip3 install git+https://github.com/stanfordnlp/dspy.git@mm-adapter

Note I didn't include it here but I tested with the dspy.Retry module using try:except and I still get a ValueError with output shown.

cielonet avatar Oct 05 '24 01:10 cielonet

Sweet thanks for sending this. Will look more into it tomorrow

isaacbmiller avatar Oct 05 '24 01:10 isaacbmiller

Sweet thanks for sending this. Will look more into it tomorrow

Hey there, Just wanted to see if there were any new updates on this issue? If there is anything I can do to assist with testing, etc., please let me know. Thank you.

cielonet avatar Oct 09 '24 16:10 cielonet

I ran my code with v2.5.6 on Python 3.12 and am still getting similar error as the author: https://github.com/NumberChiffre/mcts-llm/blob/main/notebooks/evaluate.ipynb

All good with 2.4.17, swapped to 2.5.6 to get the following with the evaluate(ZeroShotCoT()), same outcome with ChainOfThought or TypedChainOfThought:

Average Metric: 14 [/](https://file+.vscode-resource.vscode-cdn.net/) 23  (60.9):  64%|██████▍   | 23/36 [00:17<00:11,  1.09it/s] ERROR:dspy.evaluate.evaluate:2024-10-11T05:26:41.462176Z [error    ] Error for example in dev set: 		 Expected dict_keys(['reasoning', 'answer']) but got dict_keys(['reasoning']). Set `provide_traceback=True` to see the stack trace. [dspy.evaluate.evaluate] filename=evaluate.py lineno=198
Average Metric: 19.0 [/](https://file+.vscode-resource.vscode-cdn.net/) 35  (54.3):  97%|█████████▋| 35/36 [00:46<00:02,  2.21s/it]ERROR:dspy.evaluate.evaluate:2024-10-11T05:27:25.933474Z [error    ] Error for example in dev set: 		 Expected dict_keys(['reasoning', 'answer']) but got dict_keys(['reasoning']). Set `provide_traceback=True` to see the stack trace. [dspy.evaluate.evaluate] filename=evaluate.py lineno=198
Average Metric: 19.0 [/](https://file+.vscode-resource.vscode-cdn.net/) 36  (52.8): 100%|██████████| 36/36 [01:04<00:00,  1.79s/it]

NumberChiffre avatar Oct 11 '24 05:10 NumberChiffre

Just a quick update. I re-tested this morning with the latest version of dspy==2.5.9 and the problem seems to have gone away. I will continue to do more testing. Perhaps #1630 fixed the issue?!

Also another change I made was now I am using vllm as opposed to tgi as the backend with phi3.5-mini. Not sure if this was the issue?

cielonet avatar Oct 15 '24 14:10 cielonet

Just a quick update. I re-tested this morning with the latest version of dspy==2.5.9 and the problem seems to have gone away. I will continue to do more testing. Perhaps #1630 fixed the issue?!

Also another change I made was now I am using vllm as opposed to tgi as the backend with phi3.5-mini. Not sure if this was the issue?

Thanks for the update. I'm running with 2.5.7 and even tried with qwen2.5:14b-instruct with the same issue. I've managed to get around this problem by adding docstring using the prompt instructions before MIPROv2 crashed to one of my dspy.Signature that caused this problem, and am able to run MIPROv2 without any errors. Turns out not having any docstring for dspy.Signature was a problem when using with litellm. Now even qwen2.5:7b-instruct works.

NumberChiffre avatar Oct 16 '24 04:10 NumberChiffre

  • still getting similar error as the author on latest v2.5.10 on Python 3.10.
  • code copy from : https://dspy-docs.vercel.app/docs/deep-dive/optimizers/miprov2
Optimizing program with MIPRO...

==> STEP 1: BOOTSTRAP FEWSHOT EXAMPLES <==
These will be used as few-shot example candidates for our program and for creating instructions.

Bootstrapping N=7 sets of demonstrations...
Bootstrapping set 1/7
...

==> STEP 2: PROPOSE INSTRUCTION CANDIDATES <==
We will use the few-shot examples from the previous step, a generated dataset summary, a summary of the program code, and a randomly selected prompting tip to propose instructions.

Proposing instructions...
...
Evaluating the default program...
...
== Minibatch Trial 5 / 15 ==
Average Metric: 4 / 25  (16.0): 100%|██████████| 25/25 [00:32<00:00,  1.31s/it]
Score: 16.0 on minibatch of size 25 with parameters ['Predictor 1: Instruction 6', 'Predictor 1: Few-Shot Set 1'].
Minibatch scores so far: [36.0, 32.0, 32.0, 16.0, 16.0]
Full eval scores so far: [31.25]
Best full score so far: 31.25
============================


== Minibatch Trial 6 / 15 ==
  0%|          | 0/25 [00:00<?, ?it/s]2024-10-16T09:56:22.099382Z [error    ] Error for example in dev set: 		 Expected dict_keys(['reasoning', 'answer']) but got dict_keys(['reasoning']). Set `provide_traceback=True` to see the stack trace. [dspy.evaluate.evaluate] filename=evaluate.py lineno=198
2024-10-16T09:56:22.106825Z [error    ] Error for example in dev set: 		 Expected dict_keys(['reasoning', 'answer']) but got dict_keys(['reasoning']). Set `provide_traceback=True` to see the stack trace. [dspy.evaluate.evaluate] filename=evaluate.py lineno=198
...

chansonZ avatar Oct 16 '24 10:10 chansonZ

We are working on a larger scale workaround for issues like this.

For now, expect that the 2.5 predictors will error more loudly than before. Depending on how large of an LLM i am using, I temper my expectations of how well it can follow the required format.

There are two possible things I do here:

  1. pass max_errors=10000 to your optimizer or evaluation function. The performance should still equal or outperform pre-2.5.
  2. Few shot examples really help small models understand the format. E.g. Llama 3.2 1B struggles with the format (~2%) but when shown few shot examples it understands the format much better.

isaacbmiller avatar Oct 16 '24 17:10 isaacbmiller

datapoint: dolphin-llama3 struggles with the output format but llama3 does not.

Harrolee avatar Oct 16 '24 18:10 Harrolee

Just confirmed this issue is still a problem with phi3.5. :-(

cielonet avatar Oct 18 '24 04:10 cielonet

Following as I'm having the same issue, using gpt-4o-mini.

ValueError: Expected dict_keys(['advice', 'categories']) but got dict_keys(['advice'])
Average Metric: 66.34999999999998 / 184  (36.1):  62%|██████▏   | 184/299 [00:19<00:12,  9.27it/s] 

paulacanva avatar Oct 24 '24 19:10 paulacanva

Hey everyone, can you:

  1. Upgrade to DSPy latest, i.e. v2.5.18
  2. Use dspy.Predict or dspy.ChainOfThought instead of dspy.TypedPredictor.
  3. Let me know here if the issue is not solved.

I think things are better now for most models.

okhat avatar Oct 27 '24 16:10 okhat

So I did some testing on v2.5.18 with ms phi3.5 and found something interesting.

When my code looks like this -> qa = dspy.ChainOfThought('input -> reasoning, output') I always get an error ValueError: Expected dict_keys(['reasoning', 'output']) but got dict_keys(['reasoning'])

But when I add just a single space to the end of the parameter in the ChainOfThought function like this qa = dspy.ChainOfThought('input -> reasoning, output ')

I always get a solid response back!

RESPONSE: Prediction(
    reasoning="To determine which number is larger, we compare the numbers digit by digit starting from the leftmost (highest value) digit. Both numbers have '9' in the ones place, so we move to the next digit to the right, which is the tenths place. In 9.11, the tenths digit is '1', and in 9.9, the tenths digit is '9'. Since '9' is larger than '1', we can conclude that 9.9 is larger than 9.11 without needing to compare further digits.",
    output='9.9'
)

Interestingly if I add three parts to the return like qa = dspy.ChainOfThought('input -> reasoning , illustration , output')

Only certain cases work but not 100%

    # Doesn't work
    qa = dspy.ChainOfThought('input -> reasoning , illustration , output')
    qa = dspy.ChainOfThought('input -> reasoning, illustration, output')
    qa = dspy.ChainOfThought('input -> reasoning, illustration, output ')

    # Works but not %100
    qa = dspy.ChainOfThought('input -> reasoning, illustration , output')
    qa = dspy.ChainOfThought('input -> reasoning, illustration , output ')
    qa = dspy.ChainOfThought('input -> reasoning,illustration, output ') 

This seems to work really well also!

qa = dspy.Predict('input -> reasoning: str , illustration: list[str] , output: float')
qa = dspy.Predict('input -> reasoning_steps: dict[str, str],output: float')
qa = dspy.Predict('input -> reasoning_steps: dict[str, str],ordered_output: list[float]')

RESPONSE: Prediction(
    reasoning_steps={'step1': 'Start by comparing the whole numbers in both numbers. Both numbers have the whole number 9, so we proceed to the next step, which is comparing decimal parts for deciding on the larger number when the whole numbers are equal or used for further comparison impacts when they differ as in this case.'},
    ordered_output=[9.9]
)

cielonet avatar Oct 28 '24 19:10 cielonet

As per another example provided by @okhat, adding type hints seems to have solved the issue for me. It might be worth updating docs (such as this) to make clear that DSPy leverages such kind of information.

paulacanva avatar Oct 29 '24 21:10 paulacanva

I'm just adding my two cents here. I was getting the similar error above when using ollama and mistral-nemo:12b-instruct-2407-q8_0 (i'm not adding most of my code ):

`mistral_nemo = ollama_llm_client("mistral-nemo:12b-instruct-2407-q8_0")

class ParsePassage(dspy.Signature):

    """ You are a English professor  """

    narration = dspy.InputField(desc="Text from a student")
    answer = dspy.OutputField(desc="""Output must  be in Json array in the format """`

Error: raise ValueError(f"Expected {signature.output_fields.keys()} but got {fields.keys()}") ValueError: Expected dict_keys(['reasoning', 'answer']) but got dict_keys([])

When I changed answer = dspy.OutputField(desc="""Output must be in Json array in the format """)

to: answer: dict = dspy.OutputField(desc="""Output must be in Json array in the format """)

The error went away. I hope this helps someone

rahmed12 avatar Nov 03 '24 13:11 rahmed12

Hey everyone, can you:

1. Upgrade to DSPy latest, i.e. v2.5.18

2. Use `dspy.Predict` or `dspy.ChainOfThought` instead of `dspy.TypedPredictor`.

3. Let me know here if the issue is not solved.

I think things are better now for most models.

@okhat Can confirm I still get the issue on dspy.ChainOfThought on v2.5.29 for multiple models through ollama (Mixtral, Gemma2, GPT-4o, and Phi3)

jpmalone0 avatar Nov 12 '24 03:11 jpmalone0

This happen with any model that doesn't perform well with structured output.

falmanna avatar Nov 12 '24 14:11 falmanna