certmagic-storage-dynamodb icon indicating copy to clipboard operation
certmagic-storage-dynamodb copied to clipboard

Possible bug in locking mechanism

Open mholt opened this issue 4 years ago • 8 comments

Hello! We've received a few reports of certificates not renewing and they have similar symptoms:

  • DynamoDB is the storage module
  • Logs show that Caddy is trying to renew the certificate, but is waiting for a lock
  • The "LOCK-..." key for that certificate is present, but never deleted or considered stale
  • Even after deleting the LOCK- file, Caddy was still waiting for the lock (the Lock() method never returned)
  • Restarting Caddy after deleting the lock file manually resolved the issue

I am not familiar with DynamoDB but I felt it was important to raise the issue, as this is where my investigation has led me so far.

Are you able to reproduce this behavior at all?

/cc @liran-co and @jackellis

mholt avatar Oct 29 '20 17:10 mholt

Thanks for the report. I'll check into that tomorrow. At a minimum Dynamo should be deleting the lock keys as they expire, but perhaps either my example of how to configure the column for that is incorrect or the people reporting the problem left that part out. Anyway, I'll do some testing tomorrow and post an update when I have it.

fillup avatar Oct 29 '20 20:10 fillup

@mholt I see that my dynamodb examples do not include information about using the TTL feature nor am I setting any kind of TTL value for lock records. I can add that as an optional feature, but I also added some tests to verify that it can create locks when an existing one already exists but is expired and that works fine. So I'm not sure how or why some folks are having issues with that.

The LOCK-... records have a value with the time they should be considered expired in RFC3339 format. Can we ask one of the people who is having the problem what the value of the record is? To make sure something crazy didn't happen with how far out the expiration is.

fillup avatar Oct 30 '20 19:10 fillup

@fillup The only other idea I have is that TTL isn't guaranteed to prune. So if I set a TTL of say "in 5 mins", it could still be there for up to 24-48 hours (if i recall correctly).

I don't know Go so I can't fully follow the code but, when fetching a lock, do you also check the TTL value is not expired?

JackEllis avatar Oct 30 '20 20:10 JackEllis

The default storage implementation (file system) uses "active locking" where a timestamp within the file is updated every couple of seconds, and so even if the lock file exists after about 5 seconds, it is safe to consider it stale and then clean it up after-the-fact. Is this something that would benefit DynamoDB too? (Keep in mind I have absolutely 0 experience with DDB).

mholt avatar Oct 30 '20 22:10 mholt

just added a pr that would hopefully fix this. i removed ttl entirely and switched to some ddb ways of doing things. i also added the active locking file storage uses.

unlike @molt, i have a bunch of ddb experience but 0 experience with caddy and a little fuzzy on the inner workings of acme.

cmawhorter avatar Nov 19 '20 18:11 cmawhorter

@cmawhorter Fascinating. Just want to say thank you for all the work you've done here. There are so many of us that are so grateful for this, Cory!

P.S. I'm in the same place. I'm comfortable with DDB but have no clue about the inner workings of Caddy. Luckily @mholt knows enough for all of us ;)

JackEllis avatar Nov 19 '20 18:11 JackEllis

Fortunately, you don't need to know anything about Caddy or ACME to implement this properly; just adhere to the godoc description of the interface methods and as long as it works properly you're good to go.

mholt avatar Nov 19 '20 19:11 mholt

using the existing file storage in caddy as a reference was hugely helpful. most of the work was copy/paste from that.

yay, i was able to get things running locally against pebble (with remote ddb) and it seems to be working as expected. @fillup would probably know better though.

xcaddy build --with github.com/stampr/certmagic-storage-dynamodb/v2@stampr

cmawhorter avatar Nov 19 '20 20:11 cmawhorter