certificates
certificates copied to clipboard
Allowing GCP provisioner to issue SSH User Certificates - Option 2
Name of feature:
Allowing GCP provisioner to issue SSH User Certificates - Option 2
Pain or issue this feature alleviates:
Why is this important to the project (if not answered above):
Workloads running in GCP Compute Instances are run with an assigned Service Account. The Service Account authenticated on a given Compute Instance can be found in the Identity Token obtained from the metadata server that the GCP provisioner uses to obtain the Compute Instance identity and generate the SSH Host Certificate.
Allowing the GCP provisioner to issue SSH User Certificates would allow the above referred Workloads to use the smallstep infrastructure to sign into other Compute Instances. Examples of workloads that would benefit from this change are: CICD systems like Jenkins and Ansible.
Without this feature there would be two other options to achieve this:
- Have a separate JWK provisioner: This provisioner is present in the ca.json configuration file.
- Have a X5C provisioner to generate an intermediary X.509 certificate to then issue the SSH User Certificate from it: This option involves the creation of an intermediary certificate that could be used for TLS that will need to be maintained along with the SSH User Certificate.
However neither of these can validate the service account principal.
Is there documentation on how to use this feature? If so, where?
If this change is accepted we could update the documentation for the GCP provisioner here
In what environments or workflows is this feature supported?
This would work for smallstep-ca deployments that support GCP
In what environments or workflows is this feature explicitly NOT supported (if any)?
This will not work outside of GCP
Supporting links/other PRs/issues:
This proposal leverages the use of the Context to find out what kind of certificate is requested because the only arguments available to the AuthorizeSSHSign
is the context and the id Token (other provisioners that support both host and user certificates get access to an access token that has this information embedded). Because of this there is a significant refactor in the AuthorizeSSHSign
function but the tests are almost untouched.
We propose a different approach which tries to do the least changes at:
https://github.com/smallstep/certificates/pull/1557
❤️ Thank you!
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.
Hello @maraino
Thank you for considering this change ~
Regarding the break-up of just the creation of the initial signOptions
seems an interesting approach I could take a look at that.
I like the DisableCustomSANs
, having it enabled in the context of a User certificate would give us confidence that it is granted only to the service account attached to the instance. Could you expand on the complications with the SSH servers?
About the two new claims, I'm all in! Originally I considered the option to create a separate provisioner that handles just the User but I tought it was overkill; having new claims would be more flexible.
I like the DisableCustomSANs, having it enabled in the context of a User certificate would give us confidence that it is granted only to the service account attached to the instance. Could you expand on the complications with the SSH servers?
DisableCustomSANs always enabled for user certificates means that the principals of the certificate will only contain the email and the sanitized email. That is probably the safest configuration.
Initially, I thought that the email had some kind of instance id and it would change if you re-create your VM, but after looking at it, I've noticed that it is the service account email. So, allowing those users to log in on SSH servers is a problem.
About the two new claims, I'm all in! Originally I considered the option to create a separate provisioner that handles just the User but I tought it was overkill; having new claims would be more flexible.
It is better to keep everything in the same provisioner and add new claims to disable User or Host SSH certificates. I think they should be both enabled by default if enableSSHCA
is set to true. In fact to be backward compatible enableSSHCA
should always true to enable any of these flows.
I think is better to split this functionality in two PRs, perhaps I can work on the claims, and you on enable the SSH user certificates.
I still have to talk to my team about the DisableCustomSANs
thing, so you can assume that is always true for user certificates.
I've added this to be discussed internally https://github.com/smallstep/certificates/issues/1571, you don't have to worry about this on this PR
Initially, I thought that the email had some kind of instance id and it would change if you re-create your VM, but after looking at it, I've noticed that it is the service account email. So, allowing those users to log in on SSH servers is a problem.
The reason why I think it is a good idea to allow the creation of a user certificate with the service account principal regardless of which Compute Instance we are working on (even if there is no host metadata in the account email), is that in the GCP realm, any Compute Instance that is running with a Service Account will have that account enabled in the application default credentials. This means that any process being run by the machine or a user in that machine will "inherit" the permissions of that service account.
Ex. I have a Compute Instance stepca-server
and I signed in with my user and password. When I check the google credentials (without using the gcloud auth login
) I'm already authenticated to GCP with the VM's service account. If I were able to create a user certificate with the GCP provisioner taking the identity of the service account programmatically I could run a service like Ansible or jenkins that can sign in to other VMs (enabled with smallstep) and do stuff.
adantop@stepca-server:~$ id
uid=655354274(adantop) gid=1000(ubuntu) groups=1000(ubuntu)
adantop@stepca-server:~$ gcloud auth list
Credentialed Accounts
ACTIVE ACCOUNT
* [email protected]
To set the active account, run:
$ gcloud config set account `ACCOUNT`
Hey @maraino,
I've updated the code to generate the SignSSHOptions in a separate function and leave the main method untouched, could you please review and tell if this is what you were envisioning? after your feedback I'll look into adding the GCP provisioner attributes to toggle the host and user cert creation. I'm thinking about adding them in the GCP struct as enableSSHCAHost
and enableSSHCAUser
I ended up adding a disableSSHCAHost
and enableSSHCAUser
to the GCP struct
@adantop this is still on my list.
Thanks @maraino, I fixed the lint issue and rebased from master
Hey @adantop 👋 . Pleasure to e-meet you!
We discussed this issue in greater depth today and now have a bit clearer perspective on this PR.
First off, we strive not to introduce inconsistencies in regards to platform related feature support. So, if we were to accept these changes, we would also want to introduce similar support for the other popular cloud providers (AWS and Azure at least).
Second (and more importantly), we're wary of the security implications surrounding this change. We consider SSH user certificates to be more "vulnerable" (and of potentially higher security value) than SSH host certificates. Meaning that we think there should be more care around authorization and issuance of SSH user certificates. In a similar vein, we are also wary of the security features of Instance Identity Cloud Document provisioners - for example these provisioners can easily be configured to be less secure (disableTrustOnFirstUse
).
Our preferred and recommended approach to this problem (and the one we implement ourselves in our product) is to use the x5c provisioner, as you originally mentioned.
tl;dr, we don't want to definitively say 'NO' to adding this feature, however, we are hesitant about making a change that we believe could affect the security characteristics of issued certificates. Unfortunately, we cannot promise any progress on formulating a definitive opinion or moving this PR forwards in the short to medium term.
Having said this, we're definitely open to having our minds changed by yourself or others in the community.
Thank you for contributing to the project (we really do appreciate it); and we're sorry that we can't merge the contribution at this time.
Hey @dopey and @maraino, thanks for taking a look! I was helping @adantop with this and wanted to see if I could dig a little deeper into your response.
I don't have a good counter-argument to this being an inconsistency between the provisioners, so I'll put that aside for now. I personally think it's okay for a provisioner to have more features than another, especially if it benefits the users of that particular provisioner (i.e. us, for GCP), but I'll defer to you all on that.
Regarding the security implications, I'm not entirely sure I follow, for three reasons.
One, I think that allowing for custom SANs is inherently insecure (albeit can be somewhat mitigated by trust on first use, but we've been taking a hard stance on disabling custom SANs), and yet that is an option that users are allowed to configure at their own risk. Here, because of our aforementioned stance on custom SANs, we believe that generating an SSH user certificate for the service account associated with a machine is entirely safe - but please let me know if there is something we're overlooking!
Two, the K8sSA "can be used to sign a CSR with any SANs", and is eligible for ssh-user-cert-sign
. Our proposed addition to the GCP provisioner would be more secure - when disallowing custom SANs - than this, but at worst would be equally potentially insecure, so I think there is precedent in allowing it.
Three, it seems like we'd have the same potential security issues if we used X.509
certificates and the X5C
provider. A user could, as you said, allow custom SANs and set disableTrustOnFirstUse
in order to generate an X.509
certificate with any SANs that they wanted, and then proceed to use the X5C
provider to get their desired SSH user certificate. In addition, "[b]y default, the X5C provisioner will issue a certificates for any Subject names", so even if custom SANs were disabled or disableTrustOnFirstUse
was false
, a user would need to additionally configure the X5C
provisioner in order to securely generate SSH user certificates.
Finally, I'd like to emphasize our context for this pull request, at the risk of over-explaining. We'd like to have a secure pattern for Google Cloud VM-to-VM SSH, utilizing the source VM's GCP service account. While we could use Google's OS Login product, there are some limitations (e.g. LDAP interoperability) that make it unsuitable for us. Using the Smallstep GCP provisioner to generate an SSH certificate for the service account bound to the instance is an attractive alternative that, in our case of disabling custom SANs, would provide us the security guarantees we're looking for: any VM with a given service account should be able to get an SSH user certificate with that service account as a principal, and we can then allow that principal to SSH into destination VMs.
Anyhow, apologies for the wall of text and thanks again for your time!
@adantop @ericnorris There's a workaround that you can use, for example this template creates a user certificate:
{
"type": "user",
"keyId": {{ toJson .KeyID }},
"principals": {{ toJson .Principals }},
"extensions": {
"permit-X11-forwarding": "",
"permit-agent-forwarding": "",
"permit-port-forwarding": "",
"permit-pty": "",
"permit-user-rc": ""
}
}
If you want to create user and server certificates, you can edit this template and decide what to do by looking at the .Token
. Alternatively, you can create two different GCP provisioners, I think in GCP allows this because we can set the audience, but I haven't tested it.
Thanks, I'll try it out
Hey @dopey and @maraino, apologies for resurrecting an old thread, but now that I've had some time to dig into this, I wanted to point out a couple flaws with the workaround above that make it nearly unsuitable for use in production.
-
In order to deploy the workaround for user SSH certificates using the host certificate template, we must either have two separate provisioners, one with the host-SSH-certificiate-but-really-user-SSH-certificate template, and one with the default template; or we have to put conditional logic in the template to allow the user to request that the template produce a host certificate instead:
{{ if .Insecure.User.WantUserCertificate }} "type": "user",
-
In order to retrieve a user certificate, you must use the
step ssh certificate --host
command, and specify a hostname, even though it's not a host certificate, nor is the hostname used:$ step ssh certificate --host --set WantUserCertificate=true --host $(hostname -f)
This looks really weird, and only makes sense if you understand the trickery that is happening server-side.
-
You cannot use
step ssh proxycommand
, since that will not use the--host
flag.
None of these are individually a deal-breaker, but in total it ends up being a lot of extra effort to implement, and understanding how it works requires implicit knowledge. I think in my earlier comment I made a strong case for why this is no different than some of the existing provisioners, so I'm wondering if you could reconsider this PR?
@ericnorris, the GCP provisioner was always intended for hosts. If you want to generate host and user certificates using the same provisioner, you might have a difficult time as you will need to deal with the template. Using given insecure variables is not the only way to change the behavior or a template; you can also use webhooks, look inside the token, or look at the principals.
The easiest way is to use two different provisioners, as this will simplify the template. On the client side, you will need to use the --host
flag, as the GCP provisioner will verify that this is the given one.
You are right about step ssh proxycommand
to automatically configure the certificate, proxycommand won't get a new certificate if one already exists in the agent, but I suppose you can wrap proxycommand in a simple script to get the certificate and install it in the agent if it is not already present.
Thanks for the quick response @maraino!
the GCP provisioner was always intended for hosts.
Understood, and this PR would make it so that it could be used for GCP service accounts, much like the K8sSA provisioner. If you object to adding this functionality to the GCP provisioner and would prefer to keep it hosts-only, would you object to an entirely separate provisioner type?
Using given insecure variables is not the only way to change the behavior or a template; you can also use webhooks, look inside the token, or look at the principals.
I'm not sure we could look inside the token considering it's the same for user and host certificates; the token comes from the GCP metadata server and since the smalllstep CLI is generating it, it will be the same. Otherwise though, point taken.
The easiest way is to use two different provisioners, as this will simplify the template.
While it's certainly easier to have two of the GCP provisioners configured with different templates since the template doesn't need to have conditional logic in it, it doesn't hide the fact that you need to ask it for a host certificate even though you expect it to return a user certificate. The reason I took the insecure variable approach is fundamentally due to this weirdness; at least --set WantUserCertificate=true
sends some sort of signal to someone else looking at the code that we're not getting back a host certificate.
...but I suppose you can wrap proxycommand in a simple script to get the certificate and install it in the agent if it is not already present.
That's a good point! That said, the big draw of the proxycommand
command is that it will handle the renewal automatically; if I'm renewing the certificate before calling proxycommand
I might as well not call proxycommand
at all.
Fundamentally I think our use-case is not exotic or unreasonable - we'd like to use GCP service account identities to authenticate via SSH, just like we'd use Google user identities. This allows us to avoid having long-lived SSH keys that we'd need to distribute to hosts, which is a huge security win, in our opinion.
The GCP provisioner already has the "tough" part of supporting this implemented, since it is already validating and parsing the identity token, and this PR doesn't seem (to me) to be adding any additional attack surface or complexity to the codebase.
That said, if this is not something you'd like to support, that's fair! If so, then we'll likely need to explore other options since this use-case is important to us.
If it is something you'd be okay with supporting though, and this PR is not what you have in mind, could you point us in the right direction so that we could attempt to contribute this ourselves?
@ericnorris, we will talk about this during our open-source triage meeting next week.
@ericnorris after some discussion, our team generally agrees with the requested feature and the reasoning behind it. However, we were missing some key decision makers in our internal discussion. They'll be back from vacation next week. We'll discuss again, making sure we have the whole team's buy-in, and then I'll come back and update the issue with next steps.
Great, thank you for keeping me updated @dopey! Looking forward to hearing back about this.
Hey @ericnorris, apologies that it's taken a while to follow up.
We are open to integrating this change into the software under the condition that a configuration option be added to the ca.json
to enable and disable the new behavior. By default the option to enable
should be set to false. It's been a good while since this PR was first submitted (so, I'm not sure if it's still active) - but if someone from the community were to add our proposed change we would re-review and work with them to get it merged.
cheers 🍻
Hey smallstep team (@maraino, @dopey),
We've updated the PR based on your initial feedback, please take a look!
Some things we'd like to call out:
-
We're not encoding the "sanitized" email address of the service account as a principal. This is because the portion before the
@
sign is only unique to a project, and is not globally unique. This will conflict with the default behavior ofstep-cli
since it appears that the sanitized email address is added to the certificate request:step ssh certificate \ --provisioner=GCP --no-agent \ --insecure --no-password \ [email protected] ssh_user ✔ Provisioner: GCP (GCP) ✔ CA: https://stepca-test.us-central1-a.c.acme-smallstep-dev.internal The request was forbidden by the certificate authority: ssh certificate principals does not match - got [step-ca-runner [email protected]], want [sa_105642819181943026504 [email protected]]. Re-run with STEPDEBUG=1 for more info.
Notice that the
step-ca-runner
principal is rejected. This is solvable by passing the principals using the--principal
flag tostep-cli
, which we think is acceptable. We don't think that thestep-cli
should be updated to account for this behavior, and instead we could call it out in the documentation for this provisioner. -
We think it's worthwhile to follow Google's precedent of using the service account unique ID (present in the JWT as the sub claim) with the sa_ prefix. This is the format of usernames that they use in https://cloud.google.com/compute/docs/oslogin.
Validations
We ran the standard tests with the go test
command on the ca
module and they passed, we also ran certificate issuance tests in a GCP environment, below is the matrix of certificate issuance capabilites for enableSSHCAUser
and disableSSHCAHost
combinations
enableSSHCAUser=true |
enableSSHCAUser=false |
|
---|---|---|
disableSSHCAHost=true |
sshHost: no sshUser: yes tls: yes |
sshHost: no sshUser: no tls: yes |
disableSSHCAHost=false |
sshHost: yes sshUser: yes tls: yes |
sshHost: yes sshUser: no tls: yes |
Note: We also tested the same combinations having enableSSHCA=false
and results are as expected (able to issue TLS and unable to issue sshHost and sshUser).
User certificate test evidence
step ssh certificate --provisioner=GCP --no-agent --insecure --no-password --principal [email protected] [email protected] ssh_user
✔ Provisioner: GCP (GCP)
✔ CA: https://stepca-test.us-central1-a.c.acme-smallstep-dev.internal
✔ Private Key: ssh_user
✔ Public Key: ssh_user.pub
✔ Certificate: ssh_user-cert.pub
step ssh inspect ssh_user-cert.pub
ssh_user-cert.pub:
Type: [email protected] user certificate
Public key: ECDSA-CERT SHA256:bv/GV10R6rSXl104dBucZ6Bb3/HywMD59/Jd4mtwP9Y
Signing CA: ECDSA SHA256:ImQbM8XmPfuIWAsImjwsuhqAZW+3GlUlA2v0e8bzcIE (using ecdsa-sha2-nistp384)
Key ID: "[email protected]"
Serial: 4092897030217493163
Valid: from 2024-05-09T19:25:20 to 2024-05-10T11:26:20
Principals:
sa_105642819181943026504
[email protected]
Critical Options: (none)
Extensions:
permit-agent-forwarding
permit-port-forwarding
permit-pty
permit-user-rc
permit-X11-forwarding
Signature:
00:00:00:31:00:86:f7:ec:13:38:a1:c5:1b:b4:9b:b7:
b4:46:c1:ec:70:b4:37:b0:22:58:9d:b4:80:bf:f7:58:
13:62:57:c5:78:cc:3d:0c:33:46:f5:9b:e7:52:c0:ef:
fa:06:e7:24:8c:00:00:00:30:1a:dd:d3:fb:e8:15:d4:
70:2e:4b:b1:49:c4:70:b2:23:87:dd:56:30:c2:4c:40:
17:d9:e4:c1:1d:8b:fb:ef:85:7c:c6:58:a9:d8:6c:17:
ce:3e:1b:6c:82:0d:80:62:e6
Script that generated the test results
#!/usr/bin/bash
set -x
export STEPPATH=/etc/step
test_name=$1
ci_fqdn="$(hostname -f)"
sa_email="$(gcloud auth list --format=json --filter=status:ACTIVE | jq -r '.[0].account')"
sshHost=false
sshUser=false
tls=false
mkdir -vp "${test_name}"
step ssh certificate --host --provisioner=GCP --no-agent --insecure --no-password "${ci_fqdn}" "${test_name}/ssh_host"
if [[ $? -eq 0 ]]; then
step ssh inspect "${test_name}/ssh_host-cert.pub"
sshHost=true
fi
sleep 2
step ssh certificate --provisioner=GCP --no-agent --insecure --no-password --principal "${sa_email}" "${sa_email}" "${test_name}/ssh_user"
if [[ $? -eq 0 ]]; then
step ssh inspect "${test_name}/ssh_user-cert.pub"
sshUser=true
fi
sleep 2
step ca certificate --provisioner=GCP "${ci_fqdn}" "${test_name}/tls.crt" "${test_name}/tls.key"
if [[ $? -eq 0 ]]; then
step certificate inspect "${test_name}/tls.crt"
tls=true
fi
echo "{\"sshHost\": ${sshHost}, \"sshUser\": ${sshUser}, \"tls\": ${tls}}" > "${test_name}/result.json"
👋🏽 @maraino
I've updated the PR according to the feedback provided, could you review again?
Thanks