boulder icon indicating copy to clipboard operation
boulder copied to clipboard

Increment stat on failure to get CT log info

Open jsha opened this issue 4 years ago • 2 comments

If there's an error getting info for a CT log, we can hit this error case:

https://github.com/letsencrypt/boulder/blob/4205400a98c8285cccd478cb99e044d75f48e36d/ctpolicy/ctpolicy.go#L96-L100

This can happen, for instance, if we need to write to a temporal shard of a log, and the appropriate temporal shard isn't in the config. When that happens, we should increment a stat. Once the stat is implemented, we should follow up by adding a Prometheus alert.

jsha avatar Oct 12 '21 21:10 jsha

Ideally we should also have something looking forward into the future, so we can increment the stat before we hit a temporal shard cutover.

jsha avatar Oct 12 '21 21:10 jsha

Related error out of staging this morning.

 "Error":"getting SCTs: failed to get 2 SCTs, got 2 error(s): unable to get log info: no log found for group \"Let's Encrypt\" and expiry 2023-07-16 13:41:50 +0000 UTC; ct submission to \"Let's Encrypt Staging 1-3\" (\"http://stg-ct1.ct.letsencrypt.org/\") failed: Publisher.SubmitToSingleCTWithResult timed out after 88981 ms"

Edit: This was sapling 2023h1 aging out as of 2023-04-16 00:00 UTC and us not having sapling 2023h2 configured in boulder. Screenshot from 2023-04-17 11-03-38

pgporada avatar Apr 17 '23 14:04 pgporada