mgob
mgob copied to clipboard
improve docs and error handling
Hi, I am running this as a docker swarm service on the manager node. Nothing really happens after it says "backup has started".
Here's the service I create:
docker service create --name mgob \
-p 8090:8090 \
--mount type=bind,src=./mgob/config,dst=/config \
--mount type=bind,src=./mgob/storage,dst=/storage \
-e backup__TARGET_URI=mongodb://user:[email protected]:27017,192.168.124.140:27027,192.168.124.139:27037/mongo_staging?replicaSet=mongo_staging_rs \
-e backup__S3_BUCKET=backups \
-e backup__S3_ACCESS_KEY=xxx \
-e backup__S3_SECRET_KEY=xxx \
--constraint node.labels.mongo.replica==1 \
maxisam/mgob:1.10.14-aws -LogLevel=info
Config file:
# mgob/config/backup.yml
#
scheduler:
cron: "*/2 * * * *" # run every two minutes for testing
retention: 14 # Retains 14 local backups
timeout: 60 # Operation timeout: 60 minutes
target:
host: "192.168.124.111,192.168.124.140"
database: "mongo_staging"
port: 27017
# params: "--ssl --authenticationDatabase admin" # Additional mongodump params, leave blank if not needed
noGzip: false # Disable gzip compression (false means compression is enabled)
s3:
# accessKey and secretKey are optional for AWS, if your Docker image has awscli
accessKey:
secretKey:
api: "S3v4"
Here are the logs:
[vagrant@manager1 ~]$ docker service logs -f mgob
mgob.1.mgrroaog849a@manager1 | time="2024-05-08T12:13:54Z" level=info msg="mgob 1.10.14"
mgob.1.mgrroaog849a@manager1 | time="2024-05-08T12:13:54Z" level=info msg="starting with config: &{LogLevel:info JSONLog:false Host: Port:8090 ConfigPath:/config StoragePath:/storage TmpPath:/tmp DataPath:/data Version:1.10.14 UseAwsCli:false HasGpg:false}"
mgob.1.mgrroaog849a@manager1 | time="2024-05-08T12:13:54Z" level=info msg="mongodump version: 100.6.1 git version: 6d9d341edd33b892a2ded7bae529b0e2a96aae01 Go version: go1.19.3 os: linux arch: amd64 compiler: gc "
mgob.1.mgrroaog849a@manager1 | time="2024-05-08T12:13:54Z" level=info msg="Minio Client is disabled."
mgob.1.mgrroaog849a@manager1 | time="2024-05-08T12:13:55Z" level=info msg="aws-cli/1.22.46 Python/3.9.16 Linux/6.5.6-300.fc39.x86_64 botocore/1.23.46 "
mgob.1.mgrroaog849a@manager1 | time="2024-05-08T12:13:55Z" level=info msg="gpg (GnuPG) 2.2.31 libgcrypt 1.9.4 Copyright (C) 2021 Free Software Foundation, Inc. License GNU GPL-3.0-or-later <https://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Home: /root/.gnupg Supported algorithms: Pubkey: RSA, ELG, DSA, ECDH, ECDSA, EDDSA Cipher: IDEA, 3DES, CAST5, BLOWFISH, AES, AES192, AES256, TWOFISH, CAMELLIA128, CAMELLIA192, CAMELLIA256 Hash: SHA1, RIPEMD160, SHA256, SHA384, SHA512, SHA224 Compression: Uncompressed, ZIP, ZLIB, BZIP2 "
mgob.1.mgrroaog849a@manager1 | time="2024-05-08T12:13:55Z" level=info msg="Google Storage is disabled."
mgob.1.mgrroaog849a@manager1 | time="2024-05-08T12:13:55Z" level=info msg="Azure Storage is disabled."
mgob.1.mgrroaog849a@manager1 | time="2024-05-08T12:13:55Z" level=info msg="RClone is disabled."
mgob.1.mgrroaog849a@manager1 | time="2024-05-08T12:13:55Z" level=info msg="Next tmp cleanup run at 2024-05-08 13:00:00 +0000 UTC"
mgob.1.mgrroaog849a@manager1 | time="2024-05-08T12:13:55Z" level=info msg="New job found, saving to store" plan=backup
mgob.1.mgrroaog849a@manager1 | time="2024-05-08T12:13:55Z" level=info msg="Next run at 2024-05-08 12:14:00 +0000 UTC" plan=backup
mgob.1.mgrroaog849a@manager1 | time="2024-05-08T12:13:55Z" level=info msg="starting http server on port 8090"
mgob.1.mgrroaog849a@manager1 | time="2024-05-08T12:14:00Z" level=info msg="Backup started" plan=backup
mgob.1.mgrroaog849a@manager1 | time="2024-05-08T12:16:00Z" level=info msg="Backup started" plan=backup
mgob.1.mgrroaog849a@manager1 | time="2024-05-08T12:18:00Z" level=info msg="Backup started" plan=backup
mgob.1.mgrroaog849a@manager1 | time="2024-05-08T12:20:00Z" level=info msg="Backup started" plan=backup
mgob.1.mgrroaog849a@manager1 | time="2024-05-08T12:22:00Z" level=info msg="Backup started" plan=backup
mgob.1.mgrroaog849a@manager1 | time="2024-05-08T12:24:00Z" level=info msg="Backup started" plan=backup
mgob.1.mgrroaog849a@manager1 | time="2024-05-08T12:26:00Z" level=info msg="Backup started" plan=backup
I see nothing in /storage nor any errors. I don't receive any mongodb requests to my mongodb service.
When I docker exec -it 23a bash into the mgob container and run a mongodump --uri=<uri>, that works.
Also, going to the url 192.168.124.111:8090/backup/backup, I'd expect a request to run an ondemand backup - which also does not happen. I receive a 405 with nothing in the logs.
Is this the right image? maxisam/mgob:1.10.14-aws, I am using mongodb version 5.
Ok now I get these but no clue why it is failing!!
mgob.1.mgrroaog849a@manager1 | time="2024-05-08T12:28:00Z" level=info msg="new dump" archive= err="after 3 retries, mongodump log : exit status 1" mlog=
mgob.1.mgrroaog849a@manager1 | time="2024-05-08T12:28:00Z" level=error msg="BACKUP FAILED: after 3 retries, mongodump log : exit status 1" plan=backup
mgob.1.mgrroaog849a@manager1 | time="2024-05-08T12:28:00Z" level=info msg="Next run at 2024-05-08 12:30:00 +0000 UTC" plan=backup
mgob.1.mgrroaog849a@manager1 | time="2024-05-08T12:30:00Z" level=info msg="Backup started" plan=backup
mgob.1.mgrroaog849a@manager1 | time="2024-05-08T12:30:00Z" level=info msg="new dump" archive= err="after 3 retries, mongodump log : exit status 1" mlog=
mgob.1.mgrroaog849a@manager1 | time="2024-05-08T12:30:00Z" level=error msg="BACKUP FAILED: after 3 retries, mongodump log : exit status 1" plan=backup
mgob.1.mgrroaog849a@manager1 | time="2024-05-08T12:30:00Z" level=info msg="Next run at 2024-05-08 12:32:00 +0000 UTC" plan=backup
mgob.1.mgrroaog849a@manager1 | time="2024-05-08T12:32:00Z" level=info msg="Backup started" plan=backup
mgob.1.mgrroaog849a@manager1 | time="2024-05-08T12:32:00Z" level=info msg="new dump" archive= err="after 3 retries, mongodump log : exit status 1" mlog=
mgob.1.mgrroaog849a@manager1 | time="2024-05-08T12:32:00Z" level=error msg="BACKUP FAILED: after 3 retries, mongodump log : exit status 1" plan=backup
mgob.1.mgrroaog849a@manager1 | time="2024-05-08T12:32:00Z" level=info msg="Next run at 2024-05-08 12:34:00 +0000 UTC" plan=backup
What can I do?
Trying maxisam/mgob:2.0.20-aws but same issue.
I got it to work with this config
# mgob/config/backup.yml
#
scheduler:
cron: "*/2 * * * *" # run every day at 6:00 and 18:00 UTC
retention: 14 # Retains 14 local backups
timeout: 60 # Operation timeout: 60 minutes
target:
uri:
noGzip: false # Disable gzip compression (false means compression is enabled)
retry:
attempts: 3 # number of retries
backoffFactor: 60 # backoff factor * (2 ^ attemptCount) seconds
validation:
database:
uri:
noGzip: false
s3:
url:
bucket:
accessKey:
secretKey:
api: "S3v4"
and changing the environment variables like so
docker service create --name mgob \
-p 8090:8090 \
--mount type=bind,src=./mgob/config,dst=/config \
--mount type=bind,src=./mgob/storage,dst=/storage \
-e BACKUP__TARGET_URI=xxx \
-e BACKUP__VALIDATION_DATABASE_URI=xxx \
-e BACKUP__S3_BUCKET=xxx \
-e BACKUP__S3_ACCESSKEY=xxx \
-e BACKUP__S3_SECRETKEY=xxx \
--constraint node.labels.mongo.replica==1 \
maxisam/mgob:2.0.20-aws -LogLevel=debug
Readme uses lowercase for plan names however, the PLAN.md file suggests using all caps. So some discrepency in the docs
I was having some issues with the validation step, so I removed validation block completely and now I get this weird error which I can't figure out
level=error msg="BACKUP FAILED: mc config host for plan backup failed /bin/sh: mc: not found : exit status 127" plan=backup
what is mc?
Can we improve documentation? It looks like there's one more issue #108 where validation isn't that clear and I couldn't find anything in the documentation about this step.
Looks like minioUpload function is being called here
https://github.com/maxisam/mgob/blob/0bf29263228455012e473aa7e61aedc1f62ba782/pkg/backup/s3.go#L89
which is weird because I am using s3 in my config and I am using maxisam/mgob:2.0.20-aws image.
Should mc be included in this image by default?
It looks like after including BACKUP__S3_URL environment variable, s3Upload is called and we don't have the mc not found error anymore.
However, now we get
level=error msg="BACKUP FAILED: S3 uploading /tmp/backup-1715193000.gz to backup/backup-mongo-staging-backups failed : exit status 1" plan=backup
for
docker service create --name mgob \
-p 8090:8090 \
--mount type=bind,src=./mgob/config,dst=/config \
--mount type=bind,src=./mgob/storage,dst=/storage \
-e BACKUP__TARGET_URI=xxx \
-e BACKUP__S3_BUCKET=xxx \
-e BACKUP__S3_ACCESSKEY=xxx \
-e BACKUP__S3_SECRETKEY=xxx \
-e BACKUP__S3_URL=https://s3.amazonaws.com \
--constraint node.labels.mongo.replica==1 \
maxisam/mgob:2.0.20-aws -LogLevel=debug
is the url supposed to be https://s3.amazonaws.com? and can the error be any more verbose than "exit status 1"??
I believe two things can help me:
- [ ] Add some documentation about what the validation step is and how it's helpful
- [ ] Improve error handling and reporting. Most of the times I only get "exit status 1". What and why is not clear. In case it's failure from aws authentication perspective, those errors must be propagated and conveyed properly.
Digging further, the error seems to be
An error occurred (AccessDenied) when calling the CreateMultipartUpload operation: Access Denied`
When I try to execute aws s3 cp /tmp/planxxx.gz s3://bucket_name/ I get the above.
This is an issue with the configured AWS S3 bucket policy. However, in mgob, I get the output to be simply exit status 1. It would be nice to show the actual error message.
Thanks for all your feedback. I will try to get to them when I have time. Welcome to make a PR as well. validation step is just for testing the backup is correct or not, it basically just restore the backup to a temporary collection if I remember correctly.
You should check the test file and how it works from the Github Action. There are integration tests, which means it works for real environment.
MC is https://min.io/docs/minio/linux/reference/minio-mc.html