RD-Agent Json loads failed when LLM outputs contain boolean value

🐛 Bug Description

I used volcengine api to served as chat API. I found it always failed when the LLM go to this step. The errors is something like Expecting value: line xx column xx (char xx). It's a json decode error.

{
    "final_decision": True,
    "final_feedback": "The factor '10D_Momentum_Normalized' is implemented correctly. The code executed successfully, the output file 'result.h5' is found, and the factor value dataframe meets all specified requirements (correct MultiIndex, column name, data type, and no infinite values). The daily data alignment and absence of errors confirm the implementation is accurate."
}

The cause of this error is that the API returned by Volcengine uses boolean values that begin with an uppercase letter, which follows Python's convention. However, this is not valid in standard JSON, so json.loads will raise an error when parsing such responses.

So I added a simple function to convert the boolean value in the all_response.

def replace_true_false(self, s: str) -> str:
        def repl(match: Any) -> Any:
            return match.group(0).lower()

        return re.sub(r"\b(True|False|TRUE|FALSE)\b", repl, s)

This is just a temporary solution. Hope for a better approach.

To Reproduce

Steps to reproduce the behavior:

rdaegent fin_factor

May 30 '25 07:05 privacywill

You can change json output template True to true, False to false in code.

May 30 '25 09:05 inevity

You can change json output template True to true, False to false in code.

yes, you are right. But I am curious why the output returned by openai can be unmarshal by json.loads. Am I the only one meet this issue?

May 30 '25 09:05 privacywill

If you change all "True/False" into "true/false", there will be another problem: when the llm return a str contains code, like df.reset_index(replace=True), it will be changed to be a wrong code, and will confuse LLM

Jun 17 '25 09:06 NathanWang7

We have fixed this problem in rdagent/oai/backend/base.py via this function:

def _fix_python_booleans(json_str: str) -> str:
        """Safely fix Python-style booleans to JSON standard format using tokenize"""
        replacements = {"True": "true", "False": "false", "None": "null"}

        try:
            out = []
            io_string = io.StringIO(json_str)
            tokens = tokenize.generate_tokens(io_string.readline)

            for toknum, tokval, _, _, _ in tokens:
                if toknum == tokenize.NAME and tokval in replacements:
                    out.append(replacements[tokval])
                else:
                    out.append(tokval)

            result = "".join(out)
            return result

        except (tokenize.TokenError, json.JSONDecodeError):
            # If tokenize fails, fallback to regex method
            for python_val, json_val in replacements.items():
                json_str = re.sub(rf"\b{python_val}\b", json_val, json_str)
            return json_str

Hope can help you.

Jul 08 '25 13:07 Hoder-zyf

We have fixed this problem in rdagent/oai/backend/base.py via this function:

def _fix_python_booleans(json_str: str) -> str: """Safely fix Python-style booleans to JSON standard format using tokenize""" replacements = {"True": "true", "False": "false", "None": "null"}

    try:
        out = []
        io_string = io.StringIO(json_str)
        tokens = tokenize.generate_tokens(io_string.readline)

        for toknum, tokval, _, _, _ in tokens:
            if toknum == tokenize.NAME and tokval in replacements:
                out.append(replacements[tokval])
            else:
                out.append(tokval)

        result = "".join(out)
        # Validate if the result is valid JSON
        json.loads(result)
        return result

    except (tokenize.TokenError, json.JSONDecodeError):
        # If tokenize fails, fallback to regex method
        for python_val, json_val in replacements.items():
            json_str = re.sub(rf"\\b{python_val}\\b", json_val, json_str)
        return json_str

Hope can help you.

Actually the json error with bool values in respones is caused by the two prompts rdagent/components/coder/factor_coder/prompts.yaml and rdagent/components/coder/model_coder/prompts.yaml. Because some template in those two prompts files contain True instead of true as most other prompts files, which will affect the LLM to output True and raise Json Error. After replacing true in the prompts, the json error never come up.

Jul 09 '25 06:07 NathanWang7

That's true, but since prompt formats might vary in the future, I think checking and fixing it at runtime is a more robust and elegant solution. Thanks for pointing out the root cause in the prompts.

Jul 09 '25 06:07 Hoder-zyf