specification
specification copied to clipboard
Properly decide on availability of `$secrets` argument
What would you like to be added:
currently $secrets is documented to be available everywhere, but I know @cdavernas mentioned it should only be available in input.from to avoid leaks.
Why is this needed:
I understand the reasoning but I'm concerned it might have unintended consequences. Keeping $secrets accessible everywhere makes it clear to the author that they're dealing with sensitive information, ensuring caution in its use. However, if $secrets is only available in input.from, there's a risk that as the data moves through the pipeline, the secret could be aliased, potentially obscuring its sensitive nature.
do:
- someUnsafeCall:
call: http
with:
method: get
endpoint: ${ "https://server.com/api/unsafe-auth/\($secrets.token)" # <-- the use of sensitive data is clear
vs
do:
- someUnsafeCall:
input:
from:
token: $secrets.token
call: http
with:
method: get
endpoint: https://server.com/api/unsafe-auth/{token} # <-- the use of sensitive data is not obvious
@JBBianchi I agree with you, but on the other hand leaving them accessible everywhere is like telling the authors it's ok to use them everywhere, like when calling a (potentially custom) function.
Plus, if the user explicitly decided to flow it down, the responsibility is IMHO his.
@JBBianchi good point. It's much harder to accidentally leak the $secrets arg than the input
@JBBianchi good point. It's much harder to accidentally leak the $secrets arg than the input
I disagree. I don't see why it would be harder. For most users, $secrets is just gonna be yet another runtime expression variable.
Annyways, whatever you guys decide is fine by me. I just think we should be extra careful with those, as it's our responsibility to "educate" authors on the proper approach to adopt when using this feature.
To quote ChatGPT's opinion on it:
In a Serverless Workflow, the availability of the $secrets argument should be carefully considered to balance security and flexibility.
1. Security Considerations
- Limiting Scope: Making
$secretsavailable only at the workflow input (or in controlled, specific places) reduces the risk of accidental exposure or misuse of sensitive data. The broader the availability of secrets, the higher the chances of them being logged, passed to unintended functions, or exposed in runtime errors. - Principle of Least Privilege: This principle suggests that secrets should only be accessible where absolutely necessary. By restricting
$secretsto workflow input, you adhere to this principle, ensuring that secrets are only available to components that truly need them.
2. Flexibility Considerations
- Complex Workflows: In some workflows, you might need to access secrets at different stages, not just at the beginning. Allowing
$secretsto be available throughout the workflow can enhance flexibility and make the workflow easier to design and maintain. - Modular and Reusable Components: If different parts of the workflow are designed to be modular and reusable, they might require access to secrets independently. This scenario would argue for broader access to
$secrets.
Best Practice Recommendation
Given the above considerations, the most secure approach is to limit the availability of $secrets to the workflow input and specific, well-defined points within the workflow where access is necessary. This could include:
- The start of the workflow to initialize required parameters.
- Specific states or functions that need secret access.
- Ensuring secrets are not accidentally passed down through the workflow unnecessarily.
Implementation Strategy
- Default Restriction: Make
$secretsavailable by default only at the workflow input. - Explicit Access: Allow developers to explicitly declare if and where they need access to
$secretswithin the workflow. This could be achieved via configuration or annotations, requiring an extra step to expose secrets, thus making users consciously decide when and where secrets are needed.
This approach balances security with the necessary flexibility, ensuring that secrets are protected while still allowing workflows to function as needed.
I think it's easier to leak when the secret has to be put in the input
imagine I want to send some object (e.g. pet) to an api key authenticated API, first I write:
do:
- apiCall:
call: http
with:
method: post
endpoint: https://server.com/api/pets
body: ${ . }
... then I realise I forgot auth so I add it
do:
- apiCall:
input:
from:
key: ${ $secrets.apiKey }
call: http
with:
method: post
endpoint: https://server.com/api/pets
headers:
X-API-Key: ${ .key }
body: ${ . }
and I already made the subtle mistake of no ONLY sending my api key, because input transformations have so much impact an the whole task. It would have been much harder to make such a mistake when using $secrets
Maybe there is a mitigation to be found with implicitly "sharing" a new $secrets variable ?
e.g.:
do:
- apiCall:
input:
$secrets:
key: ${ $secrets.apiKey }
call: http
with:
method: post
endpoint: https://server.com/api/pets
headers:
X-API-Key: ${ $secrets.key }
body: ${ . }
And, for instance
do:
- apiCall:
input:
from:
key: ${ $secrets.apiKey }
#...
Would fail because we do not inject $secrets when processing input.from but only input.$secrets ?
Just throwing ideas here.
Would fail because we do not inject $secrets when processing input.from but only input.$secrets ?
This feels like it achieves the same just with extra steps because now I have to use ${ .$secrets.foo } (extra .) instead of ${ $secrets.foo } inside the task which seems really confusing. Plus I see this to just being used as
do:
- apiCall:
input:
$secrets: ${ $secrets } # also looks terrible doesn't it 😅
call: http
with:
...
@matthias-pichler very good sample, which perfectly demonstrates your rightful concerns. You could however leak them in a similar fashion using the runtime expression argument.
Anyways, I won't endlessly ramble on the subject: I think you make a strong and valid point.
@JBBianchi even though I love both the idea and the (awesome) will to find a consensual solution, I feel this would complexify things for authors and might even be confusing.
To conclude, you guys convinced me, and have my vote to definitely close that issue!
I don't think we should dismiss this topic, the concern is real. Imagine a distributed runtime where each activity processor is a FaaS for instance.
Is it really a good idea to always send the all $secrets over the wire ?
Shouldn't we find a way to explicitly send the $secrets required by the activity ?
In that case, shouldn't we enforce a variable name that clearly underlines the sensitive nature of the data to the author ?
Imagine a distributed runtime where each activity processor is a FaaS for instance.
Very good point, which was also, if I remember properly, part of the motivation of the initial restriction.
@matthias-pichler Any idea/suggestion?
Imagine a distributed runtime where each activity processor is a FaaS for instance.
Our runtime is currently implemented such that every task is run in a Lambda Function.
I understand that it's not ideal/least privilege that every Lambda Function needs access to the secrets, however given the tradeoffs I would still prefer having $secrets everywhere. Furthermore currently we have input transformation and task execution in two separate functions but due to performance considerations I am currently merging input/output transforms and task executions into one Lambda Function. So given these circumstances (which I assume are fairly common) it wouldn't change anything and every Task Lambda Function would still need access to secrets for the sake of input transformations
Imagine a distributed runtime where each activity processor is a FaaS for instance.
Our runtime is currently implemented such that every task is run in a Lambda Function.
I understand that it's not ideal/least privilege that every Lambda Function needs access to the secrets, however given the tradeoffs I would still prefer having
$secretseverywhere. Furthermore currently we have input transformation and task execution in two separate functions but due to performance considerations I am currently merging input/output transforms and task executions into one Lambda Function. So given these circumstances (which I assume are fairly common) it wouldn't change anything and every Task Lambda Function would still need access to secrets for the sake of input transformations
Well, I'm glad that my hypothetical use-case is an actual one. But I don't think it's OK to share secrets with "everyone", that's quite against their purpose.
What are the arguments against "explicitly select" the secrets provided to a task ?
The following is just a conceptual example, the how can be adjusted:
document:
#...
use:
secrets:
- GitHubToken
- XToken
do:
- release:
secrets:
- GitHubToken # explicitly share the `GitHubToken` secret
call: http
with:
method: GET
endpoint:
uril: https://api.github.com/my-repo/release
token: ${ $secrets.GitHubToken }
- announceOnX:
secrets:
- XToken # explicitly share the `XToken` secret
call: http
with:
method: POST
endpoint:
uril: https://api.github.com/my-repo/release
token: ${ $secrets.XToken }
body: ${ "Checkout our new release at \(.artifact.url)" }
- updateInternalRegistry:
call: http
with:
method: POST
endpoint:
uril: https://server.local
token: ${ $secrets.GitHubToken } # <--- error, $secrets is empty
It does add an extra step but it doesn't seem to be a huge overhead.
Well, I'm glad that my hypothetical use-case is an actual one. But I don't think it's OK to share secrets with "everyone", that's quite against their purpose.
I don't quite agree with "share secrets with "everyone"" ... all the Lambda Functions are managed and implemented by us and it would not be any different if it were run in a container. If a custom function (which we don't control) is invoked the input and any expressions in it are evaluated on our side and then the untrusted code is executed with the parsed input... I don't really see who'd we be sharing the secrets with?
What are the arguments against "explicitly select" the secrets provided to a task ?
I think the only one I have is that it's just more work for authors and it seems super repetitive, e.g. when building a workflow that calls the same API 4 times having to specify it 5 times in total (including at the workflow level). It's also so much more overhead around secrets that I have not seen from any other workflow engine. E.g. Pipedream, GitHub, GitLab, etc you define secrets and then they are available as environment variables. And I don't really see any reason why we should make it much more complicated
I don't quite agree with "share secrets with "everyone"" ... all the Lambda Functions are managed and implemented by us and it would not be any different if it were run in a container. If a custom function (which we don't control) is invoked the input and any expressions in it are evaluated on our side and then the untrusted code is executed with the parsed input... I don't really see who'd we be sharing the secrets with?
I must admit I dramatized the situation (although I wouldn't say relying on a 3rd party infrastructure is the same as running a container on your own isolated host). My concern doesn't 100% apply on the situation you describe. The problem would rather be when calling a 3rd party activity processor, or if the network has been compromised (e.g. subject to MitM attacks).
I think the only one I have is that it's just more work for authors and it seems super repetitive, e.g. when building a workflow that calls the same API 4 times having to specify it 5 times in total (including at the workflow level). It's also so much more overhead around secrets that I have not seen from any other workflow engine. E.g. Pipedream, GitHub, GitLab, etc you define secrets and then they are available as environment variables. And I don't really see any reason why we should make it much more complicated
I agree, it's not very pretty and quite cumbersome.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.