pkl icon indicating copy to clipboard operation
pkl copied to clipboard

Improve handling of CA certificates

Open odenix opened this issue 1 year ago • 11 comments

Instead of bundling Pkl's built-in CA certificates as a class path resource and loading them at runtime, pass them to the native image compiler as the default SSL context's trust store. This results in faster SSL initialization and is more consistent with how default certificates are handled when running on the JVM.

Further related improvements:

  • Change --ca-certificates and corresponding APIs to accept directories in addition to files. Passing a directory has the same effect as passing each of the directory's regular files.
  • Set the ~/.pkl/cacerts CLI default directly in CliBaseOptions.
  • Remove HttpClientBuilder methods addDefaultCliCertificates and addBuiltInCertificates.
  • Remove pkl-certs subproject and the optional dependencies on it.
  • Move PklCARoots.pem to pkl-cli/src/certs.
  • Rename occurrences of certificateFiles to certificatePaths.
  • Fix certificate related error messages that were missing an argument.
  • Prevent PklBugException if initialization of CliBaseOptions.httpClient fails.

odenix avatar Mar 29 '24 22:03 odenix

Could someone take a look at this? It’s a significant improvement. Once I found https://www.graalvm.org/latest/reference-manual/native-image/dynamic-features/CertificateManagement/#build-time-options, everything fell in place.

odenix avatar Apr 17 '24 17:04 odenix

I haven't forgotten about this one. Will take a look when I get the chance.

bioball avatar Apr 17 '24 23:04 bioball

To recap the current behavior:

  • pkl eval bird.pkl --ca-certificates not/a/valid/path.pem fails if and when an HTTPS request is made.
  • If --ca-certificates is not specified and ~/.pkl/cacerts does not exist or is empty, the built-in certificates are used.
  • It's impossible to not trust any certificate. I also can't think of a use case for this. To disable HTTPS, --allowed-modules should be used.

The changes proposed in this PR have many upsides:

  1. Nice separation of concerns: pkl-core implements a generic certificate file/directory feature, and pkl-cli implements the ~/.pkl/cacerts default.
  2. Directory scanning, including error handling, happens in the same place and at the same time as certificate file loading. Everything is lazy.
  3. Removing addBuiltinCertificates and addDefaultCertificates reduces the API surface of HttpClientBuilder.
  4. The behavior and default of --ca-certificates/certificatePaths is easy to explain and document.

Given that a non-existing ~/.pkl/cacerts directory must be tolerated, I decided to keep things simple and also tolerate non-existing files. If you feel strongly that this isn't good enough, I can think of three solutions:

  1. Use separate options for certificate files and directories. Then non-existing files can be distinguished from non-existing directories and can be treated differently.
  2. Change HttpClient to fail if a certificate path (file or directory) does not exist. Change pkl-cli to only set the ~/.pkl/cacerts default if the directory exists.
  3. Move directory scanning to pkl-cli and make it eager.

I think I like (2) relatively best because it keeps most of the listed upsides. What do you think?

One more question: Are we certain that certificateUris is still needed? If not, I'd like to remove it until there is a proven need.

odenix avatar Apr 26 '24 04:04 odenix

In 0.25 (the current release), pkl eval will error immediately if the given --ca-certificates files are invalid. I missed this--in the current dev version, it will only fail when the first HTTPS request is made. This feels like a behavior regression to me. In that case, perhaps we shouldn't initialize the HttpClient lazily? I think this is your option 3?

I'll do some benchmarking here; I'm curious how much time we're saving by trying to make things lazy.

One more question: Are we certain that certificateUris is still needed? If not, I'd like to remove it until there is a proven need.

I don't think this is needed anymore, now that we don't use classpath resources for embedding the cert files.

bioball avatar Apr 26 '24 18:04 bioball

We discussed before that being lazy means that SSL initialization errors are deferred. The conclusion was that being lazy is in the Pkl spirit. Lazy SSL initialization is why 0.26 is much snappier than 0.25. The JVM also initializes SSL lazily. It’s known to be expensive.

odenix avatar Apr 26 '24 19:04 odenix

I don't think this is needed anymore, now that we don't use classpath resources for embedding the cert files.

Removed.

odenix avatar Apr 26 '24 21:04 odenix

You're right; initializing cert stuff takes quite a while. In particular, CertificateFactory#generateCertificates() takes ~160ms to initialize when passed a pretty standard cert bundle. I think that's expensive enough to justify "let's defer spawning until it's actually needed". In that case, option 2 indeed sounds like the best approach. I also think that unless there's a good use-case for accepting a directory in our CLI flag, let's go back to accepting just a cert bundle.

Also: I'm kind of shocked that it takes so along to initialize. I'm curious if there's some better approach here.

Another observation: SSLContext.getDefault() takes ~5ms to return on native, and about 100ms to return in jpkl.

bioball avatar Apr 26 '24 23:04 bioball

I also think that unless there's a good use-case for accepting a directory in our CLI flag, let's go back to accepting just a cert bundle

I think that accepting a directory makes sense both from a user perspective (can use a directory other than ~/.pkl/cacerts) and from an internal perspective (see "upsides"). For comparison, cURL supports both files (--cacert) and directories (--capath).

Also: I'm kind of shocked that it takes so along to initialize. I'm curious if there's some better approach here.

Default SSL context with baked-in certificates and lazy SSL initialization is already a significant improvement compared to 0.25.

Perhaps SSL initialization time for user-provided certificates could be further improved by accepting a .p12 file instead of .pem files and changing the javax.net.ssl.trustStore property at runtime instead of creating our own SSL context. But this also has downsides.

https://www.graalvm.org/latest/reference-manual/native-image/dynamic-features/CertificateManagement/#runtime-options

odenix avatar Apr 27 '24 02:04 odenix

I think that accepting a directory makes sense both from a user perspective (can use a directory other than ~/.pkl/cacerts) and from an internal perspective (see "upsides"). For comparison, cURL supports both files (--cacert) and directories (--capath).

Hmm... yeah, I don't feel strongly about this. Sounds fine to me to accept directories too, especially since both cURL and wget do. But I think let's follow their lead in this case, and make it a separate flag (I think maybe call it --ca-directory), which would also guard against unintentional configuration errors.

bioball avatar Apr 29 '24 18:04 bioball

But I think let's follow their lead in this case, and make it a separate flag (I think maybe call it --ca-directory), which would also guard against unintentional configuration errors.

Which config errors are you trying to guard against? I feel a single option is more elegant and keeps the CLI/API smaller. Also --ca-certificates works well for both files and directories.

odenix avatar Apr 30 '24 17:04 odenix

The error would be accidentally setting the flag to a directory, when it was supposed to be a file, or vice versa.

I also like that:

  • This aligns with how curl and wget work
  • This is simply a new feature, rather than a change in behavior of an existing flag.

bioball avatar Apr 30 '24 18:04 bioball