certificates icon indicating copy to clipboard operation
certificates copied to clipboard

Allowing GCP provisioner to issue SSH User Certificates - Option 2

Open adantop opened this issue 1 year ago • 23 comments

Name of feature:

Allowing GCP provisioner to issue SSH User Certificates - Option 2

Pain or issue this feature alleviates:

Why is this important to the project (if not answered above):

Workloads running in GCP Compute Instances are run with an assigned Service Account. The Service Account authenticated on a given Compute Instance can be found in the Identity Token obtained from the metadata server that the GCP provisioner uses to obtain the Compute Instance identity and generate the SSH Host Certificate.

Allowing the GCP provisioner to issue SSH User Certificates would allow the above referred Workloads to use the smallstep infrastructure to sign into other Compute Instances. Examples of workloads that would benefit from this change are: CICD systems like Jenkins and Ansible.

Without this feature there would be two other options to achieve this:

  • Have a separate JWK provisioner: This provisioner is present in the ca.json configuration file.
  • Have a X5C provisioner to generate an intermediary X.509 certificate to then issue the SSH User Certificate from it: This option involves the creation of an intermediary certificate that could be used for TLS that will need to be maintained along with the SSH User Certificate.

However neither of these can validate the service account principal.

Is there documentation on how to use this feature? If so, where?

If this change is accepted we could update the documentation for the GCP provisioner here

In what environments or workflows is this feature supported?

This would work for smallstep-ca deployments that support GCP

In what environments or workflows is this feature explicitly NOT supported (if any)?

This will not work outside of GCP

Supporting links/other PRs/issues:

This proposal leverages the use of the Context to find out what kind of certificate is requested because the only arguments available to the AuthorizeSSHSign is the context and the id Token (other provisioners that support both host and user certificates get access to an access token that has this information embedded). Because of this there is a significant refactor in the AuthorizeSSHSign function but the tests are almost untouched.

We propose a different approach which tries to do the least changes at:

https://github.com/smallstep/certificates/pull/1557

❤️ Thank you!

adantop avatar Sep 26 '23 18:09 adantop

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Sep 26 '23 18:09 CLAassistant

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

CLAassistant avatar Sep 26 '23 18:09 CLAassistant

Hello @maraino

Thank you for considering this change ~

Regarding the break-up of just the creation of the initial signOptions seems an interesting approach I could take a look at that.

I like the DisableCustomSANs, having it enabled in the context of a User certificate would give us confidence that it is granted only to the service account attached to the instance. Could you expand on the complications with the SSH servers?

About the two new claims, I'm all in! Originally I considered the option to create a separate provisioner that handles just the User but I tought it was overkill; having new claims would be more flexible.

adantop avatar Oct 04 '23 15:10 adantop

I like the DisableCustomSANs, having it enabled in the context of a User certificate would give us confidence that it is granted only to the service account attached to the instance. Could you expand on the complications with the SSH servers?

DisableCustomSANs always enabled for user certificates means that the principals of the certificate will only contain the email and the sanitized email. That is probably the safest configuration.

Initially, I thought that the email had some kind of instance id and it would change if you re-create your VM, but after looking at it, I've noticed that it is the service account email. So, allowing those users to log in on SSH servers is a problem.

About the two new claims, I'm all in! Originally I considered the option to create a separate provisioner that handles just the User but I tought it was overkill; having new claims would be more flexible.

It is better to keep everything in the same provisioner and add new claims to disable User or Host SSH certificates. I think they should be both enabled by default if enableSSHCA is set to true. In fact to be backward compatible enableSSHCA should always true to enable any of these flows.

I think is better to split this functionality in two PRs, perhaps I can work on the claims, and you on enable the SSH user certificates.

I still have to talk to my team about the DisableCustomSANs thing, so you can assume that is always true for user certificates.

maraino avatar Oct 04 '23 18:10 maraino

I've added this to be discussed internally https://github.com/smallstep/certificates/issues/1571, you don't have to worry about this on this PR

maraino avatar Oct 04 '23 19:10 maraino

Initially, I thought that the email had some kind of instance id and it would change if you re-create your VM, but after looking at it, I've noticed that it is the service account email. So, allowing those users to log in on SSH servers is a problem.

The reason why I think it is a good idea to allow the creation of a user certificate with the service account principal regardless of which Compute Instance we are working on (even if there is no host metadata in the account email), is that in the GCP realm, any Compute Instance that is running with a Service Account will have that account enabled in the application default credentials. This means that any process being run by the machine or a user in that machine will "inherit" the permissions of that service account.

Ex. I have a Compute Instance stepca-server and I signed in with my user and password. When I check the google credentials (without using the gcloud auth login) I'm already authenticated to GCP with the VM's service account. If I were able to create a user certificate with the GCP provisioner taking the identity of the service account programmatically I could run a service like Ansible or jenkins that can sign in to other VMs (enabled with smallstep) and do stuff.

adantop@stepca-server:~$ id
uid=655354274(adantop) gid=1000(ubuntu) groups=1000(ubuntu)
adantop@stepca-server:~$ gcloud auth list
                          Credentialed Accounts
ACTIVE  ACCOUNT
*       [email protected]

To set the active account, run:
    $ gcloud config set account `ACCOUNT`

adantop avatar Oct 05 '23 15:10 adantop

Hey @maraino,

I've updated the code to generate the SignSSHOptions in a separate function and leave the main method untouched, could you please review and tell if this is what you were envisioning? after your feedback I'll look into adding the GCP provisioner attributes to toggle the host and user cert creation. I'm thinking about adding them in the GCP struct as enableSSHCAHost and enableSSHCAUser

adantop avatar Oct 10 '23 17:10 adantop

I ended up adding a disableSSHCAHost and enableSSHCAUser to the GCP struct

adantop avatar Oct 17 '23 18:10 adantop

@adantop this is still on my list.

maraino avatar Oct 24 '23 19:10 maraino

Thanks @maraino, I fixed the lint issue and rebased from master

adantop avatar Nov 06 '23 21:11 adantop

Hey @adantop 👋 . Pleasure to e-meet you!

We discussed this issue in greater depth today and now have a bit clearer perspective on this PR.

First off, we strive not to introduce inconsistencies in regards to platform related feature support. So, if we were to accept these changes, we would also want to introduce similar support for the other popular cloud providers (AWS and Azure at least).

Second (and more importantly), we're wary of the security implications surrounding this change. We consider SSH user certificates to be more "vulnerable" (and of potentially higher security value) than SSH host certificates. Meaning that we think there should be more care around authorization and issuance of SSH user certificates. In a similar vein, we are also wary of the security features of Instance Identity Cloud Document provisioners - for example these provisioners can easily be configured to be less secure (disableTrustOnFirstUse).

Our preferred and recommended approach to this problem (and the one we implement ourselves in our product) is to use the x5c provisioner, as you originally mentioned.

tl;dr, we don't want to definitively say 'NO' to adding this feature, however, we are hesitant about making a change that we believe could affect the security characteristics of issued certificates. Unfortunately, we cannot promise any progress on formulating a definitive opinion or moving this PR forwards in the short to medium term.

Having said this, we're definitely open to having our minds changed by yourself or others in the community.

Thank you for contributing to the project (we really do appreciate it); and we're sorry that we can't merge the contribution at this time.

dopey avatar Nov 08 '23 07:11 dopey

Hey @dopey and @maraino, thanks for taking a look! I was helping @adantop with this and wanted to see if I could dig a little deeper into your response.

I don't have a good counter-argument to this being an inconsistency between the provisioners, so I'll put that aside for now. I personally think it's okay for a provisioner to have more features than another, especially if it benefits the users of that particular provisioner (i.e. us, for GCP), but I'll defer to you all on that.

Regarding the security implications, I'm not entirely sure I follow, for three reasons.

One, I think that allowing for custom SANs is inherently insecure (albeit can be somewhat mitigated by trust on first use, but we've been taking a hard stance on disabling custom SANs), and yet that is an option that users are allowed to configure at their own risk. Here, because of our aforementioned stance on custom SANs, we believe that generating an SSH user certificate for the service account associated with a machine is entirely safe - but please let me know if there is something we're overlooking!

Two, the K8sSA "can be used to sign a CSR with any SANs", and is eligible for ssh-user-cert-sign. Our proposed addition to the GCP provisioner would be more secure - when disallowing custom SANs - than this, but at worst would be equally potentially insecure, so I think there is precedent in allowing it.

Three, it seems like we'd have the same potential security issues if we used X.509 certificates and the X5C provider. A user could, as you said, allow custom SANs and set disableTrustOnFirstUse in order to generate an X.509 certificate with any SANs that they wanted, and then proceed to use the X5C provider to get their desired SSH user certificate. In addition, "[b]y default, the X5C provisioner will issue a certificates for any Subject names", so even if custom SANs were disabled or disableTrustOnFirstUse was false, a user would need to additionally configure the X5C provisioner in order to securely generate SSH user certificates.

Finally, I'd like to emphasize our context for this pull request, at the risk of over-explaining. We'd like to have a secure pattern for Google Cloud VM-to-VM SSH, utilizing the source VM's GCP service account. While we could use Google's OS Login product, there are some limitations (e.g. LDAP interoperability) that make it unsuitable for us. Using the Smallstep GCP provisioner to generate an SSH certificate for the service account bound to the instance is an attractive alternative that, in our case of disabling custom SANs, would provide us the security guarantees we're looking for: any VM with a given service account should be able to get an SSH user certificate with that service account as a principal, and we can then allow that principal to SSH into destination VMs.

Anyhow, apologies for the wall of text and thanks again for your time!

ericnorris avatar Nov 08 '23 15:11 ericnorris

@adantop @ericnorris There's a workaround that you can use, for example this template creates a user certificate:

{
	"type": "user",
	"keyId": {{ toJson .KeyID }},
	"principals": {{ toJson .Principals }},
	"extensions": {
		"permit-X11-forwarding":   "",
		"permit-agent-forwarding": "",
		"permit-port-forwarding":  "",
		"permit-pty":              "",
		"permit-user-rc":          ""
	}
}

If you want to create user and server certificates, you can edit this template and decide what to do by looking at the .Token. Alternatively, you can create two different GCP provisioners, I think in GCP allows this because we can set the audience, but I haven't tested it.

maraino avatar Nov 08 '23 21:11 maraino

Thanks, I'll try it out

adantop avatar Nov 08 '23 22:11 adantop

Hey @dopey and @maraino, apologies for resurrecting an old thread, but now that I've had some time to dig into this, I wanted to point out a couple flaws with the workaround above that make it nearly unsuitable for use in production.

  1. In order to deploy the workaround for user SSH certificates using the host certificate template, we must either have two separate provisioners, one with the host-SSH-certificiate-but-really-user-SSH-certificate template, and one with the default template; or we have to put conditional logic in the template to allow the user to request that the template produce a host certificate instead:

     {{ if .Insecure.User.WantUserCertificate }}
     "type": "user",
    
  2. In order to retrieve a user certificate, you must use the step ssh certificate --host command, and specify a hostname, even though it's not a host certificate, nor is the hostname used:

    $ step ssh certificate --host --set WantUserCertificate=true --host $(hostname -f)
    

    This looks really weird, and only makes sense if you understand the trickery that is happening server-side.

  3. You cannot use step ssh proxycommand, since that will not use the --host flag.

None of these are individually a deal-breaker, but in total it ends up being a lot of extra effort to implement, and understanding how it works requires implicit knowledge. I think in my earlier comment I made a strong case for why this is no different than some of the existing provisioners, so I'm wondering if you could reconsider this PR?

ericnorris avatar Mar 28 '24 20:03 ericnorris

@ericnorris, the GCP provisioner was always intended for hosts. If you want to generate host and user certificates using the same provisioner, you might have a difficult time as you will need to deal with the template. Using given insecure variables is not the only way to change the behavior or a template; you can also use webhooks, look inside the token, or look at the principals.

The easiest way is to use two different provisioners, as this will simplify the template. On the client side, you will need to use the --host flag, as the GCP provisioner will verify that this is the given one.

You are right about step ssh proxycommand to automatically configure the certificate, proxycommand won't get a new certificate if one already exists in the agent, but I suppose you can wrap proxycommand in a simple script to get the certificate and install it in the agent if it is not already present.

maraino avatar Mar 28 '24 20:03 maraino

Thanks for the quick response @maraino!

the GCP provisioner was always intended for hosts.

Understood, and this PR would make it so that it could be used for GCP service accounts, much like the K8sSA provisioner. If you object to adding this functionality to the GCP provisioner and would prefer to keep it hosts-only, would you object to an entirely separate provisioner type?

Using given insecure variables is not the only way to change the behavior or a template; you can also use webhooks, look inside the token, or look at the principals.

I'm not sure we could look inside the token considering it's the same for user and host certificates; the token comes from the GCP metadata server and since the smalllstep CLI is generating it, it will be the same. Otherwise though, point taken.

The easiest way is to use two different provisioners, as this will simplify the template.

While it's certainly easier to have two of the GCP provisioners configured with different templates since the template doesn't need to have conditional logic in it, it doesn't hide the fact that you need to ask it for a host certificate even though you expect it to return a user certificate. The reason I took the insecure variable approach is fundamentally due to this weirdness; at least --set WantUserCertificate=true sends some sort of signal to someone else looking at the code that we're not getting back a host certificate.

...but I suppose you can wrap proxycommand in a simple script to get the certificate and install it in the agent if it is not already present.

That's a good point! That said, the big draw of the proxycommand command is that it will handle the renewal automatically; if I'm renewing the certificate before calling proxycommand I might as well not call proxycommand at all.


Fundamentally I think our use-case is not exotic or unreasonable - we'd like to use GCP service account identities to authenticate via SSH, just like we'd use Google user identities. This allows us to avoid having long-lived SSH keys that we'd need to distribute to hosts, which is a huge security win, in our opinion.

The GCP provisioner already has the "tough" part of supporting this implemented, since it is already validating and parsing the identity token, and this PR doesn't seem (to me) to be adding any additional attack surface or complexity to the codebase.

That said, if this is not something you'd like to support, that's fair! If so, then we'll likely need to explore other options since this use-case is important to us.

If it is something you'd be okay with supporting though, and this PR is not what you have in mind, could you point us in the right direction so that we could attempt to contribute this ourselves?

ericnorris avatar Mar 28 '24 21:03 ericnorris

@ericnorris, we will talk about this during our open-source triage meeting next week.

maraino avatar Mar 28 '24 21:03 maraino

@ericnorris after some discussion, our team generally agrees with the requested feature and the reasoning behind it. However, we were missing some key decision makers in our internal discussion. They'll be back from vacation next week. We'll discuss again, making sure we have the whole team's buy-in, and then I'll come back and update the issue with next steps.

dopey avatar Apr 10 '24 20:04 dopey

Great, thank you for keeping me updated @dopey! Looking forward to hearing back about this.

ericnorris avatar Apr 10 '24 21:04 ericnorris

Hey @ericnorris, apologies that it's taken a while to follow up.

We are open to integrating this change into the software under the condition that a configuration option be added to the ca.json to enable and disable the new behavior. By default the option to enable should be set to false. It's been a good while since this PR was first submitted (so, I'm not sure if it's still active) - but if someone from the community were to add our proposed change we would re-review and work with them to get it merged.

cheers 🍻

dopey avatar May 03 '24 19:05 dopey

Hey smallstep team (@maraino, @dopey),

We've updated the PR based on your initial feedback, please take a look!

Some things we'd like to call out:

  • We're not encoding the "sanitized" email address of the service account as a principal. This is because the portion before the @ sign is only unique to a project, and is not globally unique. This will conflict with the default behavior of step-cli since it appears that the sanitized email address is added to the certificate request:

    step ssh certificate \
      --provisioner=GCP --no-agent \
      --insecure --no-password \
      [email protected] ssh_user
    ✔ Provisioner: GCP (GCP)
    ✔ CA: https://stepca-test.us-central1-a.c.acme-smallstep-dev.internal
    The request was forbidden by the certificate authority: ssh certificate principals does not match - got [step-ca-runner [email protected]], want [sa_105642819181943026504 [email protected]].
    Re-run with STEPDEBUG=1 for more info.
    

    Notice that the step-ca-runner principal is rejected. This is solvable by passing the principals using the --principal flag to step-cli, which we think is acceptable. We don't think that the step-cli should be updated to account for this behavior, and instead we could call it out in the documentation for this provisioner.

  • We think it's worthwhile to follow Google's precedent of using the service account unique ID (present in the JWT as the sub claim) with the sa_ prefix. This is the format of usernames that they use in https://cloud.google.com/compute/docs/oslogin.

Validations

We ran the standard tests with the go test command on the ca module and they passed, we also ran certificate issuance tests in a GCP environment, below is the matrix of certificate issuance capabilites for enableSSHCAUser and disableSSHCAHost combinations

enableSSHCAUser=true enableSSHCAUser=false
disableSSHCAHost=true sshHost: no
sshUser: yes
tls: yes
sshHost: no
sshUser: no
tls: yes
disableSSHCAHost=false sshHost: yes
sshUser: yes
tls: yes
sshHost: yes
sshUser: no
tls: yes

Note: We also tested the same combinations having enableSSHCA=false and results are as expected (able to issue TLS and unable to issue sshHost and sshUser).

User certificate test evidence

step ssh certificate --provisioner=GCP --no-agent --insecure --no-password --principal [email protected] [email protected] ssh_user
✔ Provisioner: GCP (GCP)
✔ CA: https://stepca-test.us-central1-a.c.acme-smallstep-dev.internal
✔ Private Key: ssh_user
✔ Public Key: ssh_user.pub
✔ Certificate: ssh_user-cert.pub

step ssh inspect ssh_user-cert.pub
ssh_user-cert.pub:
        Type: [email protected] user certificate
        Public key: ECDSA-CERT SHA256:bv/GV10R6rSXl104dBucZ6Bb3/HywMD59/Jd4mtwP9Y
        Signing CA: ECDSA SHA256:ImQbM8XmPfuIWAsImjwsuhqAZW+3GlUlA2v0e8bzcIE (using ecdsa-sha2-nistp384)
        Key ID: "[email protected]"
        Serial: 4092897030217493163
        Valid: from 2024-05-09T19:25:20 to 2024-05-10T11:26:20
        Principals: 
                sa_105642819181943026504
                [email protected]
        Critical Options: (none)
        Extensions: 
                permit-agent-forwarding 
                permit-port-forwarding 
                permit-pty 
                permit-user-rc 
                permit-X11-forwarding 
        Signature:
                00:00:00:31:00:86:f7:ec:13:38:a1:c5:1b:b4:9b:b7:
                b4:46:c1:ec:70:b4:37:b0:22:58:9d:b4:80:bf:f7:58:
                13:62:57:c5:78:cc:3d:0c:33:46:f5:9b:e7:52:c0:ef:
                fa:06:e7:24:8c:00:00:00:30:1a:dd:d3:fb:e8:15:d4:
                70:2e:4b:b1:49:c4:70:b2:23:87:dd:56:30:c2:4c:40:
                17:d9:e4:c1:1d:8b:fb:ef:85:7c:c6:58:a9:d8:6c:17:
                ce:3e:1b:6c:82:0d:80:62:e6

Script that generated the test results

#!/usr/bin/bash

set -x


export STEPPATH=/etc/step

test_name=$1
ci_fqdn="$(hostname -f)"
sa_email="$(gcloud auth list --format=json --filter=status:ACTIVE | jq -r '.[0].account')"
sshHost=false
sshUser=false
tls=false

mkdir -vp "${test_name}"

step ssh certificate --host --provisioner=GCP --no-agent --insecure --no-password "${ci_fqdn}" "${test_name}/ssh_host"
if [[ $? -eq 0 ]]; then
  step ssh inspect "${test_name}/ssh_host-cert.pub"
  sshHost=true
fi

sleep 2
step ssh certificate --provisioner=GCP --no-agent --insecure --no-password --principal "${sa_email}" "${sa_email}" "${test_name}/ssh_user"
if [[ $? -eq 0 ]]; then
  step ssh inspect "${test_name}/ssh_user-cert.pub"
  sshUser=true
fi


sleep 2
step ca certificate --provisioner=GCP "${ci_fqdn}" "${test_name}/tls.crt" "${test_name}/tls.key"
if [[ $? -eq 0 ]]; then
  step certificate inspect "${test_name}/tls.crt"
  tls=true
fi

echo "{\"sshHost\": ${sshHost}, \"sshUser\": ${sshUser}, \"tls\": ${tls}}" > "${test_name}/result.json"

adantop avatar May 09 '24 22:05 adantop

👋🏽 @maraino

I've updated the PR according to the feedback provided, could you review again?

Thanks

adantop avatar Jun 04 '24 14:06 adantop