Consider Checking for Malformed {{ and }} brackets and Give Nice Error in prompt_template_engine.py
https://github.com/microsoft/semantic-kernel/blob/8590ec5e446715e4ec8e0cd7aa59bb54e3f2d09a/python/semantic_kernel/template_engine/prompt_template_engine.py#L161-L172
Consider adding in a check of open / closed {{ and }} to give a nice error clue:
Possible solution if leading }} is valid:
while cursor < len(template):
# ... (previous code)
# When '{{' is found
if _get_char() == STARTER and _get_char(1) == STARTER:
if start_found:
raise TemplateException(
TemplateException.ErrorCodes.SyntaxError,
"Unmatched '{{' and '}}' brackets found in the template."
)
start_pos = cursor
start_found = True
or catch any mismatch
open_bracket_count = 0
while cursor < len(template):
# When '{{' is found
if _get_char() == STARTER and _get_char(1) == STARTER:
open_bracket_count += 1
# When '}}' is found
elif get_char() == ENDER and _get_char(1) == ENDER:
open_bracket_count -= 1
if start_found:
# Logic
# Move the cursor forward
cursor += 1
# Check for unmatched brackets
if open_bracket_count != 0:
raise TemplateException(
TemplateException.ErrorCodes.SyntaxError,
"Unmatched '{{' and '}}' brackets found in the template."
)
@ghadlich thanks for looking that deep into the template logic :-) The current approach was made on purpose to reduce the need for handling special chars. E.g. templates like
{{ text here
and
txt here }}
are considered valid "text" and used as is. This approach allows to parse templates very quickly (runtime cost), and minimizes the need for special syntax (see notes here https://github.com/microsoft/semantic-kernel/blob/main/docs/PROMPT_TEMPLATE_LANGUAGE.md).
In your case, to avoid errors, one would be forced to do this:
{{ "{{" }} text here
and
txt here {{ "}}" }}
If I understand your suggestion correctly, you would prefer intentional extra syntax over unintentional errors, and that's a fair point. I don't think we're going to make this change now, but I see a few options we could brainstorm (I like the first most):
- provide tools to validate prompt syntax, e.g. methods like kernel.ValidateFunctionsSyntax
- allow to configure the template engine, e.g. strict vs non-strict (without breaking backward compatibility)
- compiling prompts, then the cost of parsing can be reduced, and extra validation would not be too taxing (though this depends on how the kernel is used, and which runtime one would use (python, c#, typescript, etc)
Thanks for the response, the grammar intro was helpful. Is the python version going to check for the quote like in the C# parser?
https://github.com/microsoft/semantic-kernel/blob/a80668598bbb4f077be9344d93bd08be38ea3114/dotnet/src/SemanticKernel/TemplateEngine/TemplateTokenizer.cs#L124
The python version will parse differently for {{ "}}" }}, instead of }} this would parse as " and " }}, right? As there is no fast forwarding when seeing ".
The python and typescript branches do not support the new syntax yet, you're correct, so they can fail to parse invalid syntax. The support for "values" is a new feature we added only recently, so it's a matter of a couple of weeks I think until it's available also in python.
@ghadlich This PR updates the prompt templating engine for Python! https://github.com/microsoft/semantic-kernel/pull/200
I'd invite you to check it out and give feedback!
This is now in the python branch.