eliza LLM can't be trusted to parse it's own json

Describe the bug

We trust the LLM to parse it's own JSON resulting in what a separate issue referred to as an infinite loop (which technically will resolve itself if left alone to smash on the OpenAI endpoint for long enough)

# Instructions: Write the next message for lina. Include an action, if appropriate. Possible response actions: MUTE_ROOM, ASK_CLAUDE, NONE, IGNORE

Response format should be formatted in a JSON block like this:
json
{ "user": "lina", "text": string, "action": string }

Message is json
{ "user": "lina", "text": "Oh honey~ Working with a pioneer sounds tantalizing... but only if he can keep up with me and my fiery spirit 😉 Now spill the details or I might get bored!", "action": NONE }

response is json
{ "user": "lina", "text": "Oh honey~ Working with a pioneer sounds tantalizing... but only if he can keep up with me and my fiery spirit 😉 Now spill the details or I might get bored!", "action": NONE }

parsedContent is null
parsedContent is null, retrying

Notice above that the action: value NONE is not a string. Now take a look at the correctly parsed JSON immediately following this:

parsedContent is {
  user: 'lina',
  text: "Oh darling st4rgard3n~ I'm always up for a little blockchain banter or maybe some spicy discussions about funding public goods... but don't think I won't call you out if you get all serious on me.<br> So what's the plan with @mattyryze?",
  action: 'NONE'
}

Here the LLM has correctly formatted NONE as 'NONE' a correct string.

To Reproduce

Just run eliza with a cheap llm model long enough and you will definitely encounter this one.

Expected behavior

The message returned from the LLM should then be formatted into JSON in the program.

Oct 31 '24 08:10 St4rgarden

This issue https://github.com/ai16z/eliza/issues/70 is not accurate but it's a duplicate of this issue now.

Oct 31 '24 08:10 St4rgarden

several python libs solve/attempt to solve this, in order of my personal opinion of them: -outlines -instructor -lmql -guidance

probably more -- however, not sure if any have a typescript equivalent

Oct 31 '24 20:10 twilwa

if it's openai, we can use structured output mode: https://platform.openai.com/docs/guides/structured-outputs

Oct 31 '24 20:10 twilwa

kind of a hacky workaround for non-openai models: run the model through a LiteLLM proxy server: https://github.com/BerriAI/litellm

https://docs.litellm.ai/docs/completion/json_mode -- it's called json mode, but i think you can do any kind of structured output. Just replace the OPENAI_API_URL with localhost:4000 and should be compatible

Nov 01 '24 04:11 twilwa

This could help with the issue:

function parseLLMJson<T>(rawResponse: string): T {
  // Sanitize JSON while preserving native types
  const sanitizedJson = rawResponse.replace(
    /(\w+):\s*([^,}\s]+)/g,
    (match, key, value) => {
      // Don't quote if it's a number
      if (/^-?\d+(\.\d+)?$/.test(value)) {
        return `"${key}": ${value}`;
      }
      
      // Don't quote if it's a boolean
      if (value === 'true' || value === 'false') {
        return `"${key}": ${value}`;
      }
      
      // Don't quote if it's already properly quoted
      if (/^["'].*["']$/.test(value)) {
        return `"${key}": ${value.replace(/^['"](.*)['"]$/, '"$1"')}`;
      }
      
      // Quote everything else
      return `"${key}": "${value}"`;
    }
  );

  try {
    return JSON.parse(sanitizedJson) as T;
  } catch (error) {
    console.error('Failed to parse JSON:', error);
    throw new Error('Invalid JSON format');
  }
}

Nov 01 '24 08:11 alextitonis

@St4rgarden I wonder if simply explaining it better in instructions would solve it like

Possible response actions: MUTE_ROOM, ASK_CLAUDE, NONE, IGNORE
Response format should be formatted in a JSON block like this:
json
{ "user": "lina", "text": string, "action": string }
example
{ "user": "lina", "text": "sometext", "action": "ASK_CLAUDE"}

Nov 02 '24 15:11 Elyx0

yep. hi @Elyx0 :)

Nov 04 '24 08:11 lalalune

Yeah I had a similar question about the current approach for generateObject in packages/core/generation.ts. It looks like we're using a workaround instead of the { generateObject } method from "ai", which natively supports Z objects and ensures typing. This could be more reliable than the current method of using generateText to generate, parse, and retry until we get the desired output.

Using { generateObject } would allow us to eliminate the custom generateObject and generateObjectArray functions, simplifying the code and leveraging the AI SDK's structured output capabilities. Here’s the code as it stands now:

export async function generateObject({
    runtime,
    context,
    modelClass,
}: {
    runtime: IAgentRuntime;
    context: string;
    modelClass: string;
}): Promise<any> {
    if (!context) {
        elizaLogger.error("generateObject context is empty");
        return null;
    }
    let retryDelay = 1000;

    while (true) {
        try {
            const response = await generateText({
                runtime,
                context,
                modelClass,
            });
            const parsedResponse = parseJSONObjectFromText(response);
            if (parsedResponse) {
                return parsedResponse;
            }
        } catch (error) {
            elizaLogger.error("Error in generateObject:", error);
        }

        await new Promise((resolve) => setTimeout(resolve, retryDelay));
        retryDelay *= 2;
    }
}

My proposal is to replace it with the generateObject function provided in the AI SDK, as described below:

/**
Generate JSON with any schema for a given prompt using a language model.

This function does not stream the output. If you want to stream the output, use `streamObject` instead.

@returns
A result object that contains the generated object, the finish reason, the token usage, and additional information.
*/
declare function generateObject(options: Omit<CallSettings, 'stopSequences'> & Prompt & {
    output: 'no-schema';
    model: LanguageModel;
    mode?: 'json';
    experimental_telemetry?: TelemetrySettings;
    experimental_providerMetadata?: ProviderMetadata;
    _internal?: {
        generateId?: () => string;
        currentDate?: () => Date;
    };
}): Promise<GenerateObjectResult<JSONValue>>;

Switching to this method would improve reliability and reduce custom parsing logic. I'd be interested to hear your thoughts!

Nov 14 '24 01:11 monilpat

eliza eliza copied to clipboard

LLM can't be trusted to parse it's own json

eliza
eliza copied to clipboard