semantic-kernel Consider Checking for Malformed {{ and }} brackets and Give Nice Error in prompt_template

https://github.com/microsoft/semantic-kernel/blob/8590ec5e446715e4ec8e0cd7aa59bb54e3f2d09a/python/semantic_kernel/template_engine/prompt_template_engine.py#L161-L172

Consider adding in a check of open / closed {{ and }} to give a nice error clue:

Possible solution if leading }} is valid:

    while cursor < len(template):
        # ... (previous code)

        # When '{{' is found
        if _get_char() == STARTER and _get_char(1) == STARTER:
            if start_found:
               raise TemplateException(
               TemplateException.ErrorCodes.SyntaxError,
               "Unmatched '{{' and '}}' brackets found in the template."
            )

            start_pos = cursor
            start_found = True

or catch any mismatch

    open_bracket_count = 0

    while cursor < len(template):

        # When '{{' is found
        if _get_char() == STARTER and _get_char(1) == STARTER:

            open_bracket_count += 1
        # When '}}' is found
        elif get_char() == ENDER and _get_char(1) == ENDER:

            open_bracket_count -= 1

            if start_found:
                 # Logic

        # Move the cursor forward
        cursor += 1

    # Check for unmatched brackets
    if open_bracket_count != 0:
        raise TemplateException(
            TemplateException.ErrorCodes.SyntaxError,
            "Unmatched '{{' and '}}' brackets found in the template."
        )

Mar 21 '23 00:03 ghadlich

@ghadlich thanks for looking that deep into the template logic :-) The current approach was made on purpose to reduce the need for handling special chars. E.g. templates like

{{ text here

and

txt here }}

are considered valid "text" and used as is. This approach allows to parse templates very quickly (runtime cost), and minimizes the need for special syntax (see notes here https://github.com/microsoft/semantic-kernel/blob/main/docs/PROMPT_TEMPLATE_LANGUAGE.md).

In your case, to avoid errors, one would be forced to do this:

{{ "{{" }} text here

and

txt here {{ "}}" }}

If I understand your suggestion correctly, you would prefer intentional extra syntax over unintentional errors, and that's a fair point. I don't think we're going to make this change now, but I see a few options we could brainstorm (I like the first most):

provide tools to validate prompt syntax, e.g. methods like kernel.ValidateFunctionsSyntax
allow to configure the template engine, e.g. strict vs non-strict (without breaking backward compatibility)
compiling prompts, then the cost of parsing can be reduced, and extra validation would not be too taxing (though this depends on how the kernel is used, and which runtime one would use (python, c#, typescript, etc)

Mar 21 '23 04:03 dluc

Thanks for the response, the grammar intro was helpful. Is the python version going to check for the quote like in the C# parser?

https://github.com/microsoft/semantic-kernel/blob/a80668598bbb4f077be9344d93bd08be38ea3114/dotnet/src/SemanticKernel/TemplateEngine/TemplateTokenizer.cs#L124

The python version will parse differently for {{ "}}" }}, instead of }} this would parse as " and " }}, right? As there is no fast forwarding when seeing ".

Mar 21 '23 08:03 ghadlich

The python and typescript branches do not support the new syntax yet, you're correct, so they can fail to parse invalid syntax. The support for "values" is a new feature we added only recently, so it's a matter of a couple of weeks I think until it's available also in python.

Mar 23 '23 05:03 dluc

@ghadlich This PR updates the prompt templating engine for Python! https://github.com/microsoft/semantic-kernel/pull/200

I'd invite you to check it out and give feedback!

Mar 29 '23 03:03 alexchaomander

This is now in the python branch.

Apr 04 '23 15:04 evchaki

Consider Checking for Malformed {{ and }} brackets and Give Nice Error in prompt_template_engine.py