caddy icon indicating copy to clipboard operation
caddy copied to clipboard

Ignore / Overwrite blank certificate files

Open yroc92 opened this issue 3 years ago • 9 comments

Intro

When a disk storage source lacks space, Caddy can potentially create a certificate file but fail to write the contents of the PEM to the file, so you're left with a blank file. This will throw errors similar to the one below and the hosted site will be down.

{
   "level":"error",
   "ts":1657557563.517728,
   "msg":"running dynamically-loaded config failed",
   "error":"loading http app module: provision http: getting tls app: loading tls app module: provision tls: provisioning automation policy 1: loading TLS automation management module: position 0: loading module 'internal': provision tls.issuance.internal: loading pki app module: provision pki: provisioning CA 'local': decoding intermediate certificate PEM: no PEM block found"
}

Proposed Behavior

Caddy should perhaps treat the empty file as if it doesn't exist. Whether the blank file should be removed / overwritten, I'll defer to your judgement. If the storage issue hasn't been addressed, then a new error could be thrown indicating that the attempt was made.

Context

In my use case, we saw the following error indicating that the key file wasn't written to because of a lack of storage

Jul 11 12:04:57 m08296 caddy[443]: {"level":"info","ts":1657541097.5037012,"logger":"pki","msg":"intermediate expires soon; renewing","ca":"local","time_remaining":65337.496346244}
Jul 11 12:04:57 m08296 caddy[443]: {"level":"error","ts":1657541097.513083,"logger":"pki","msg":"renewing intermediate certificates","error":"generating new certificate: saving intermediate certificate: write /var/lib/caddy/.local/share/c
addy/pki/authorities/local/intermediate.crt: no space left on device","ca":"local"}
Jul 11 12:14:57 m08296 caddy[443]: {"level":"info","ts":1657541697.5043483,"logger":"pki","msg":"intermediate expires soon; renewing","ca":"local","time_remaining":64737.49569013}
Jul 11 12:14:57 m08296 caddy[443]: {"level":"error","ts":1657541697.5119374,"logger":"pki","msg":"renewing intermediate certificates","error":"generating new certificate: saving intermediate certificate: write /var/lib/caddy/.local/share/
caddy/pki/authorities/local/intermediate.crt: no space left on device","ca":"local"}
Jul 11 12:20:06 m08296 caddy[443]: {"level":"info","ts":1657542006.4763057,"logger":"tls.on_demand","msg":"attempting certificate renewal","server_name":"<redacted IP Address>","subjects":["<redacted IP Address>"],"expiration":1657410949,"remaining":-1
31057.47625807}
Jul 11 12:20:06 m08296 caddy[443]: {"level":"error","ts":1657542006.484299,"logger":"tls.on_demand","msg":"renewing certificate on-demand failed","subjects":["<redacted IP Address>"],"not_after":1657410949,"error":"failed storage check: open /va
r/lib/caddy/.local/share/caddy/rw_test_5993988438453682016: no space left on device - storage is probably misconfigured"}
Jul 11 12:21:29 m08296 caddy[443]: {"level":"info","ts":1657542089.1113331,"logger":"tls.on_demand","msg":"attempting certificate renewal","server_name":"<redacted IP Address>","subjects":["<redacted IP Address>"],"expiration":1657410949,"remaining":-1
31140.111243954}
Jul 11 12:21:29 m08296 caddy[443]: {"level":"error","ts":1657542089.1183534,"logger":"tls.on_demand","msg":"renewing certificate on-demand failed","subjects":["<redacted IP Address>"],"not_after":1657410949,"error":"failed storage check: open /v
ar/lib/caddy/.local/share/caddy/rw_test_9127296058851621699: no space left on device - storage is probably misconfigured"}
Jul 11 12:21:41 m08296 caddy[443]: {"level":"info","ts":1657542101.2605202,"logger":"tls.on_demand","msg":"attempting certificate renewal","server_name":"<redacted IP Address>","subjects":["<redacted IP Address>"],"expiration":1657410949,"remaining":-1
31152.260509214}
Jul 11 12:21:41 m08296 caddy[443]: {"level":"error","ts":1657542101.2636528,"logger":"tls.on_demand","msg":"renewing certificate on-demand failed","subjects":["<redacted IP Address>"],"not_after":1657410949,"error":"failed storage check: open /v
ar/lib/caddy/.local/share/caddy/rw_test_6185278676679082887: no space left on device - storage is probably misconfigured"}
Jul 11 12:24:57 m08296 caddy[443]: {"level":"info","ts":1657542297.5031638,"logger":"pki","msg":"intermediate expires soon; renewing","ca":"local","time_remaining":64137.496844947}
Jul 11 12:24:57 m08296 caddy[443]: {"level":"error","ts":1657542297.5078547,"logger":"pki","msg":"renewing intermediate certificates","error":"generating new certificate: saving intermediate certificate: write /var/lib/caddy/.local/share/
caddy/pki/authorities/local/intermediate.crt: no space left on device","ca":"local"}
Jul 11 12:29:17 m08296 caddy[443]: {"level":"info","ts":1657542557.5471196,"logger":"tls.on_demand","msg":"attempting certificate renewal","server_name":"<redacted IP Address>","subjects":["<redacted IP Address>"],"expiration":1657410949,"remaining":-1
31608.547068464}
Jul 11 12:29:17 m08296 caddy[443]: {"level":"error","ts":1657542557.5580342,"logger":"tls.on_demand","msg":"renewing certificate on-demand failed","subjects":["<redacted IP Address>"],"not_after":1657410949,"error":"failed storage check: open /v
ar/lib/caddy/.local/share/caddy/rw_test_2617158143528752081: no space left on device - storage is probably misconfigured"}
Jul 11 12:34:57 m08296 caddy[443]: {"level":"info","ts":1657542897.5044258,"logger":"pki","msg":"intermediate expires soon; renewing","ca":"local","time_remaining":63537.49562478}
Jul 11 12:34:57 m08296 caddy[443]: {"level":"error","ts":1657542897.516758,"logger":"pki","msg":"renewing intermediate certificates","error":"generating new certificate: saving intermediate certificate: write /var/lib/caddy/.local/share/c
addy/pki/authorities/local/intermediate.crt: no space left on device","ca":"local"}
Jul 11 12:36:50 m08296 caddy[443]: {"level":"info","ts":1657543010.710098,"logger":"tls.on_demand","msg":"attempting certificate renewal","server_name":"<redacted IP Address>","subjects":["<redacted IP Address>"],"expiration":1657410949,"remaining":-13
2061.710078924}
Jul 11 12:36:50 m08296 caddy[443]: {"level":"error","ts":1657543010.712878,"logger":"tls.on_demand","msg":"renewing certificate on-demand failed","subjects":["<redacted IP Address>"],"not_after":1657410949,"error":"failed storage check: open /va
r/lib/caddy/.local/share/caddy/rw_test_500958991974222029: no space left on device - storage is probably misconfigured"}
Jul 11 12:44:57 m08296 caddy[443]: {"level":"info","ts":1657543497.50362,"logger":"pki","msg":"intermediate expires soon; renewing","ca":"local","time_remaining":62937.496416332}
Jul 11 12:44:57 m08296 caddy[443]: {"level":"error","ts":1657543497.5116303,"logger":"pki","msg":"renewing intermediate certificates","error":"generating new certificate: saving intermediate certificate: write /var/lib/caddy/.local/share/
caddy/pki/authorities/local/intermediate.crt: no space left on device","ca":"local"}
Jul 11 12:54:08 m08296 caddy[443]: {"level":"info","ts":1657544048.851557,"logger":"tls.on_demand","msg":"attempting certificate renewal","server_name":"<redacted IP Address>","subjects":["<redacted IP Address>"],"expiration":1657410949,"remaining":-13
3099.851503154}
Jul 11 12:54:08 m08296 caddy[443]: {"level":"error","ts":1657544048.85698,"logger":"tls.on_demand","msg":"renewing certificate on-demand failed","subjects":["<redacted IP Address>"],"not_after":1657410949,"error":"failed storage check: open /var
/lib/caddy/.local/share/caddy/rw_test_3527028469984628230: no space left on device - storage is probably misconfigured"}

These blank files and failing site were noticed after the lack of storage space was already resolved:

root@m08296:~# df -h
Filesystem      Size  Used Avail Use% Mounted on
udev            992M     0  992M   0% /dev
tmpfs           201M   21M  180M  11% /run
/dev/vda1        52G   11G   39G  23% /
tmpfs          1003M     0 1003M   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs          1003M     0 1003M   0% /sys/fs/cgroup
root@m08296:~# 

This was a local-pki intermediate cert, so there was no associated JSON file - just {root,intermediate}.{crt,key}. Intermediate.key was missing, but root.{crt,key} were both fine.

yroc92 avatar Jul 17 '22 20:07 yroc92

Another possible cause of the lack of storage is log rotation - Ubuntu rotates logs and Caddy might have attempted to write a cert while storage was full, only for more storage to be made available once logs are rotated.

yroc92 avatar Jul 18 '22 15:07 yroc92

@mholt would it make sense for the storage module to keep certs in memory until a successful write occurs?

yroc92 avatar Jul 20 '22 15:07 yroc92

I've been thinking about this and staring at code for a little while. I'm undecided on a few things.

We can either:

  • delete the file if writing fails (I don't think there should be concurrent writers so hopefully that's safe to do!) -- of course there'd still be an error at the time;
  • or, if we read an empty file but we require a non-empty file (for example, any certificate/key/metadata file written by Caddy should be non-empty) we can manually set the error to something like fs.ErrNotExist and let the normal flow continue; this should cause Caddy to fall back to its behavior as if the file didn't already exist (i.e. create a new cert) and if the issue has been resolved (enough disk space) then it should continue normally.

One issue is that this "storage space exhausted" problem is confined mainly to the file system storage backend, but the code that deals with certs isn't aware of the type of backend. (Sure, we can use a type assertion.) But maybe it's OK if the failover logic happens for other kinds of backends too (like databases).

We probably need to keep the behavior of requiring the certs/keys written to storage. Otherwise Caddy will start using credentials that aren't preserved, which could make some things messy/problematic. Something we've been careful to do from day 1 is immediately persist certs and keys to storage so they can be reused.

mholt avatar Jul 20 '22 17:07 mholt

The persistence makes sense.

Will your second solution cause a loop if there's still no space available?

yroc92 avatar Jul 22 '22 05:07 yroc92

Yeah, probably.

Sigh...

mholt avatar Jul 23 '22 01:07 mholt

Let's do option 1 then, and then when it goes to retry it can attempt? Would it attempt at the same rate that TLS cert issuance is attempted?

yroc92 avatar Jul 28 '22 06:07 yroc92

Following up on my previous comment, could we just bundle the storage availability check with the certificate issuance attempt? That way it can do exponential retries and we won't get caught in a cycle.

Thoughts?

yroc92 avatar Aug 13 '22 07:08 yroc92

Yeah, so now I'm wondering if we should just do that. We do it for normal (server) certificates, but not (yet) CA certificates. I will look into that!

mholt avatar Aug 17 '22 16:08 mholt

Semi-related, this user had (I think) broken keys in their storage for some reason, and clearing storage fixed the problem. https://caddy.community/t/can-not-get-certificate-from-issuer/17607

It seems like we could possibly try to wipe storage (reset keys) if we see we get errors like JWS verification error from Let's Encrypt.

francislavoie avatar Oct 31 '22 07:10 francislavoie