semantic-kernel
semantic-kernel copied to clipboard
Introducing trust checks in the kernel
Motivation and Context
Prompt injection attacks can fool the LLMs into doing not intended actions. This happens when untrusted input is injected into the prompt, fooling the LLM into doing something that is not intended.
To help identifying and handling such cases, we can start by tracking the trust of: (i) input variables; (ii) prompts; and (iii) function outputs. This way, if a potential sensitive function tries to run with untrusted content of any kind, the desired action can be taken (e.g. prevent the function from running and throwing an exception).
Description
The main idea behind this PR is to update the context to keep track of the trust state of its variables. So if the kernel identifies that an user defined sensitive function is trying to run with untrusted input, an action can be taken. Currently, the trust is defined as a boolean flag.
The main changes are:
- Update
ContextVariablesto store not only a single string, but an abstraction of a string calledTrustAwareStringthat also has a boolean flag about the trust of its content - Update the kernel to add a service for trust events. This service is currently named
TrustService, for which there is an interface definition:ITrustService. The user can implement this interface and set an instance of it in the kernel and/or for a specific function.- For now the interface includes two events that gets called to validate the input context and rendered prompt (semantic functions). They should return if the context/prompt is currently trusted or not. This can also be used for validation, sanitization and other tasks.
- There is a default implementation called
DefaultTrustServicethat when configured in the kernel, throws aUntrustedContentException
- Update semantic and native functions to include the following information:
bool IsSensitive: when set to true, configures the function to be sensitive (e.g. should not run with untrusted input) [default false]
- Implemented unit tests
The trust checks are also chained across multiple function calls. For example, imagine that we have 4 functions running like below:
flowchart LR
A["Function A"] --> B["Function B"] == output becomes untrusted ==> C["Function C"] --> D["Function D\n(SENSITIVE)"]
If the DefaultTrustService is used, function D will not execute, this happens because its input comes from Function C, that then comes from Function B, which is untrusted.
Further ideas (not implemented)
The idea of replacing the string usage within the Kernel with the TrustAwareString (that includes trust information) could be extended to other places in the kernel, making it easier to track provenance of the inputs and string operations. This has not been implemented in this PR because this would result in a bigger number of changes.
What's currently missing
- Currently the planner has not been updated to also track trust. This can be tackled as part of another PR if desired.
- Currently, if any of the variables stored in the context is untrusted, the entire context becomes untrusted, regardless if the variable has been used or not. This has been implemented this way for now to simplify the PR, since updating the template engine to track trust through the
TrustAwareStringwould result in a bigger number of changes.
Contribution Checklist
- [ ] The code builds clean without any errors or warnings
- [ ] The PR follows SK Contribution Guidelines (https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md)
- [ ] The code follows the .NET coding conventions (https://learn.microsoft.com/dotnet/csharp/fundamentals/coding-style/coding-conventions) verified with
dotnet format - [ ] All unit tests pass, and I have added new tests where possible
- [ ] I didn't break anyone :smile:
@rracanicci Very interesting pattern. Thanks for a careful and thorough PR! I'll take a look.
Two topics on this we should discuss before merging:
- For design - should the RAI feature set be coupled with SK or factored separately so other orchestrators or components could work with it? I have thought the RAI work we do would be factored separately and this is tightly integrated with SK implementation. I could argue it either way.
- Tactically assuming we go forward with merging - should trust for variables/prompts/function outputs be binary or a sliding scale?
I'd like @dluc to also review this for overall design.
Two topics on this we should discuss before merging:
- For design - should the RAI feature set be coupled with SK or factored separately so other orchestrators or components could work with it? I have thought the RAI work we do would be factored separately and this is tightly integrated with SK implementation. I could argue it either way.
- Tactically assuming we go forward with merging - should trust for variables/prompts/function outputs be binary or a sliding scale?
I'd like @dluc to also review this for overall design.
Hello @timlaverty, thank you very much for your response, we are glad to have these discussions.
For Question 1 - What do you mean here Tim? Are you talking about just reorganizing the code, or creating other levels of abstractions to control trust? We are happy to do both for sure. We could even extract the trust flow from the SKFunction and inject it in the Kernel as an external dependency using DI.
For Question 2 - This makes a lot of sense, we initially created this as a boolean so we can now have these conversations. I think it could be binary, categorial, or even something else, depending on the use case. We could also abstract this away as an interface to make it more customizable.
I am really happy to do all the suggested changes, I just want to make sure I understand them so I can iterate on the PR.
Thank you very much again :)
+1 for interest. This has been something that has concerned me looking though the pattern and the implementation of this library.
Great work @rracanicci - LGTM
@rracanicci @dluc I left a bunch of comments but I don't think any should block this merge. Make a pass through, and @dluc can merge when ready. If there are outstanding things to address, we can file as Issues and fix or close them separately. Great work, great thought leadership -- looking forward to this in main!
@rracanicci @dluc I left a bunch of comments but I don't think any should block this merge. Make a pass through, and @dluc can merge when ready. If there are outstanding things to address, we can file as Issues and fix or close them separately. Great work, great thought leadership -- looking forward to this in main!
Thank you very much for your thoughtful comments @shawncal! I really appreciate the detailed review and comments.
I have addressed most of what you have proposed (the main one missing is the implicit conversion from TrustAwareString to string). The changes are in a separate branch on my fork. I haven't directly pushed it here in case you guys want to tackle them in a separate PR.
I will keep iterating on it :)