certificates
certificates copied to clipboard
[Bug]: On start up exiting with "Value log truncate required to run DB. This might result in data loss" error
Steps to Reproduce
With an instance of step-ca already running, start up a new instance
Your Environment
- OS - docker (linux/amd64)
-
step-ca
Version - 0.23.0
Expected Behavior
The new instance loads successfully
Actual Behavior
The new instance exists with the following logs
badger 2023/01/11 23:40:54 INFO: All 1 tables opened in 6ms
badger 2023/01/11 23:40:54 INFO: Replaying file id: 0 at offset: 291060
badger 2023/01/11 23:40:54 WARNING: Truncate Needed. File /home/step/db/000000.vlog size: 296890 Endoffset: 291060
Error opening database of Type badgerv2 with source /home/step/db: error opening Badger database: During db.vlog.open: Value log truncate required to run DB. This might result in data loss
Additional Context
This occurs when performing a blue green deployment
The instance that's running has the following logs
badger 2023/01/11 04:00:02 INFO: All 1 tables opened in 4ms
badger 2023/01/11 04:00:02 INFO: Replaying file id: 0 at offset: 4379
badger 2023/01/11 04:00:02 INFO: Replay took: 10.556853ms
2023/01/11 04:00:02 Starting Smallstep CA/0.23.0 (linux/amd64)
2023/01/11 04:00:02 Documentation: https://u.step.sm/docs/ca
2023/01/11 04:00:02 Community Discord: https://u.step.sm/discord
2023/01/11 04:00:02 Config file: /home/step/config/ca.json
2023/01/11 04:00:02 The primary server URL is **REDACTED**
2023/01/11 04:00:02 Root certificates are available at **REDACTED**
2023/01/11 04:00:02 X.509 Root Fingerprint: 6f2a88a4e76fc8e886fe5db652c6e14f6bff34e46d29262e4e13fa3d09a1543a
2023/01/11 04:00:02 Serving HTTPS on :9000 ...
time="2023-01-11T13:34:24Z" level=info certificate=**REDACTED**
time="2023-01-11T13:44:18Z" level=info certificate=**REDACTED**
time="2023-01-11T19:30:27Z" level=info certificate=**REDACTED**
time="2023-01-11T22:04:11Z" level=info duration="182.506µs" duration-ns=182506 fields.time="2023-01-11T22:04:11Z" method=GET name=ca path="/provisioners?limit=100" protocol=HTTP/2.0 referer= remote-address=10.253.6.252 request-id=cevj4mtov6f7al6khei0 size=869 status=200 user-agent="Smallstep CLI/0.23.0 (linux/amd64)" user-id=
time="2023-01-11T22:04:11Z" level=info duration="88.074µs" duration-ns=88074 fields.time="2023-01-11T22:04:11Z" method=GET name=ca path=/provisioners/WOKl_cXXYe1FL3VfpNqg72jCDrOW_jbwc1D3KxIaan0/encrypted-key protocol=HTTP/2.0 referer= remote-address=10.253.6.252 request-id=cevj4mtov6f7al6kheig size=586 status=200 user-agent="Smallstep CLI/0.23.0 (linux/amd64)" user-id=
time="2023-01-11T22:04:13Z" level=info duration="86.383µs" duration-ns=86383 fields.time="2023-01-11T22:04:13Z" method=GET name=ca path=/root/6f2a88a4e76fc8e886fe5db652c6e14f6bff34e46d29262e4e13fa3d09a1543a protocol=HTTP/2.0 referer= remote-address=10.253.6.252 request-id=cevj4ndov6f7al6khej0 size=764 status=200 user-agent="Smallstep CLI/0.23.0 (linux/amd64)" user-id=
time="2023-01-11T22:04:13Z" level=info certificate=**REDACTED**
Contributing
Vote on this issue by adding a 👍 reaction. To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).
Appears to be related https://github.com/dgraph-io/badger/issues/1353
Hey @aburgett87 👋 . Thanks for opening the issue! And apologies that it's taken so long to get back to you.
A bit of context - Badger is the default database used by step-ca
, but it's not the database we recommend when considering a production level deploy (https://smallstep.com/docs/step-ca/certificate-authority-server-production#high-availability). We recommend more robust databases because we don't have capacity to address and assist with database specific issues.
For this specific case, as you linked in your comment, this appears to be an issue specific to Badger DB. I think our questions is, what are your expectations from step-ca
here? (Serious question - there's a number of things that step-ca
could do, but we've generally tried to stay clear of assisting with DB management).
Hi @dopey No worries. That's understandable and I'm planning to move to a postgres SQL database ASAP. I suppose some information or guidance what I should do to clear it would be suffice. To get the services from bouncing up and down I've removed the database, let it re-create it and everything appears fine. I don't have a clue about the ramifications of doing that.
What might help:
I have a go-Program that reads the database and lists the certificates in openssl index.txt format.
While this was read-only, and step is not coming up and does not lock the file, I have added the truncate-option (.WithTruncate(true)) to this program. Tada, this cleans up the mess. After that step will come up again.
I have not fully verified what was lost, however I have the last created certificates from two days ago. Full backup was 3 days old, so fixed.
func main() {
db, err := badger.Open(badger.DefaultOptions(os.Args[1]).WithTruncate(true))
if err != nil {
panic(err)
}
defer db.Close()
listCerts(db, []byte("x509_certs"))
// listRevoked(db, []byte("revoked_x509_certs"))
}
On windows I can't use step-ca because after every restart this error comes up. CA is not useable anymore.
@Marvo2011 this sounds like a case of https://github.com/dgraph-io/badger/issues/744. The issue mentions truncating the DB helps, but that's not what we should make it do by default (I think?). I also see a mention of properly shutting down the process and closing the DB, so maybe that's not happening. It's likely that this shutdown needs something specific on Windows, because I'm starting/stopping CAs all the time on macOS, and I don't see the same issue. That said, it can also happen due to an unexpected stop of the CA, so we may want this for other OSs too.
We might be able to do something similar as done in https://github.com/decred/dcrdex/pull/2388, e.g.:
db, err := badger.Open(opts)
if errors.Is(err, badger.ErrTruncateNeeded) {
// https://github.com/dgraph-io/badger/issues/744
log.Warnf("NewFileDB badger db: %v", err)
// Try again with value log truncation enabled.
opts.Truncate = true
log.Warnf("Attempting to reopen badger DB with the Truncate option set...")
db, err = badger.Open(opts)
}
Short term solution would be to use a different DB backend, but I don't know if that's an option for you.
@hslatman Thanks, yes seams related to Windows. I moved to a debian system here start/stop works fine. A different DB backend is also a solution which are supported?
@Marvo2011 you can try the MySQL and Postgres ones. Admittedly, they are a bit "heavier" to use than badger, but they shouldn't have this issue.