certmagic
certmagic copied to clipboard
Storage check -- can it be removed?
https://github.com/caddyserver/certmagic/blob/76f61c2947a20d86ca37669dbdc0ed7a96fc6c5f/config.go#L456-L461
Depending on the storage this check can be quite expensive (in our case a remote KV store), and it happens in a rather "hot" area: Right when setting up a TLS connection. If we need to get a certificate from a CA this doesn't matter too much in the grand scheme of things, but if our storage has one we triple our roundtrips (1 to get the cert information from storage, and 2 for a write/read from storage to check it).
I think this check might have some value, but not here:
- If the storage is generally trustworthy, it should be checked before it gets configured for certmagic
- If the storage is generally untrustworthy and needs regular checks, it should be part of that specific storage
So, I'd propose to just drop these lines here, and possibly move the implementation of checkStorage
as an example into documentation. WDYT?
I'm open to optimizations, but first:
It should already only do the check if a certificate is needed, which should only be "hot" once every 60 days or so -- and even that runs in the background except the very first time. Shouldn't be blocking a TLS handshake after the first time it gets a certificate.
What kinds of performance impacts are you experiencing? How have you verified that the storage checks are causing performance problems?
When the server using on-demand TLS with certmagic starts up, its in-memory caches are empty, and that's what I'm trying to optimize right now. Our Storage
is using AWS DynamoDB (based on https://github.com/silinternational/certmagic-storage-dynamodb), and as much that scales, it does see quite some load I'd like to get rid off.
But, you're right indeed ... before it would do the storage check, obtainCert
first checks with storageHasCertResourcesAnyIssuer
for any existing resources. This call should return true
(our implementation of Exists
internally does a Load
and checks for whether that worked), and following the code from there I think it will pick up the certs through an actual Storage.Load
in Config.CacheManagedCertificate
/Config.loadManagedCertificate
.
Leaves me still with some not-so-nice performance if the above is true: I should end up with 3 distinct read requests in storageHasCertResourcesAnyIssuer
, and another 3 in loadManagedCertificate
. I'll have to measure this more closely to confirm this theory. Ideally I'd like this number to be 1, of course :)
I guess I could get there by
a) Have a tiny internal cache with a short expiration for Load
, so that the second load just grabs from that cache rather than roundtrip
b) recognize the storageHasCertResourcesAnyIssuer
and pre-load all keys into that cache (in DynamoDB through BatchGetItem
or possibly by using a smart query/index operation)
It also IMHO still leaves the question open whether the storage check would not be better in the storage itself, so that an implementation could decide how to check?
The point of the storage check is to ensure the storage is properly configured, it's not really dependent upon how "trustworthy" the storage mechanism is.
It's tempting to say that just the file system needs such a check to ensure there's enough disk space, but actually most problems I've heard of in production stem from getting its CertMagic config wrong, or the actual storage system itself is misconfigured.
Maybe we could make storage checks optional, but disabling them poses a risk too.
(I suppose ideally we'd optimize them. I have a TODO that they don't need to happen so frequently, but haven't gotten around to implementing that. Even if it's just a simple memory of last time the storage was checked.)
What about sth like this:
diff --git a/config.go b/config.go
index ed155d5..266415d 100644
--- a/config.go
+++ b/config.go
@@ -976,11 +976,19 @@ func (cfg *Config) getChallengeInfo(ctx context.Context, identifier string) (Cha
return Challenge{Challenge: chalInfo}, true, nil
}
-// checkStorage tests the storage by writing random bytes
-// to a random key, and then loading those bytes and
+// checkStorage tests the storage
+//
+// If the storage implements `StorageChecker` its StorageChecker.Check
+// method will be called, otherwise the storage is checked by writing
+// random bytes to a random key, and then loading those bytes and
// comparing the loaded value. If this fails, the provided
// cfg.Storage mechanism should not be used.
func (cfg *Config) checkStorage(ctx context.Context) error {
+ if sc, ok := cfg.Storage.(StorageChecker); ok {
+ // Storage knows how to check itself, so let it do that.
+ return sc.Check(ctx)
+ }
+
key := fmt.Sprintf("rw_test_%d", weakrand.Int())
contents := make([]byte, 1024*10) // size sufficient for one or two ACME resources
_, err := weakrand.Read(contents)
diff --git a/storage.go b/storage.go
index 0bb34cd..4fc0cfe 100644
--- a/storage.go
+++ b/storage.go
@@ -81,6 +81,12 @@ type Storage interface {
Stat(ctx context.Context, key string) (KeyInfo, error)
}
+type StorageChecker interface {
+ // Check returns whether the storage is "good to go" or
+ // whether it cannot be used right now
+ Check(ctx context.Context) error
+}
+
// Locker facilitates synchronization across machines and networks.
// It essentially provides a distributed named-mutex service so
// that multiple consumers can coordinate tasks and share resources.
For me it would allow me to implement the Check
method (for instance by only checking once), for existing code nothing would change, and other more advanced storages could indeed implement some "once in while" behavior without needing a change in certmagic itself?
Maybe, I still feel like storage problems are more related to configuration than actual implementation (usually -- assuming a proper implementation and a capable storage backend).
I feel like a switch on or off in the config is probably a better solution to this. Would that be OK with you?
I feel like a switch on or off in the config is probably a better solution to this. Would that be OK with you?
Sure, that would work for us! Shall I make a PR to that effect?
@ankon Sorry for my delayed reply. Yes, I'd be happy to review that. Probably have the checks enabled by default, with the option to disable them.
I would also love to see an option to disable this. Our use-case would be the same as @ankon but for a S3 storage driver :)
@mbardelmeijer That's good to know. But be aware that S3 is not safe to use as a storage backend by itself, since it cannot support atomic operations.
@mbardelmeijer @ankon I've implemented this in 5357574. Please feel free to try it out!
Thanks a lot!
We just rolled this out for a spin, and according to our database statistics this looks great -- no more rw_test_
keys in the "top contributors" diagrams at all. We're going to leave it running for a while now, and see whether overall statistics also show improvements.
Awesome, thanks for giving it a spin!