mgob icon indicating copy to clipboard operation
mgob copied to clipboard

improve docs and error handling

Open madhums opened this issue 1 year ago • 10 comments

Hi, I am running this as a docker swarm service on the manager node. Nothing really happens after it says "backup has started".

Here's the service I create:

docker service create --name mgob \
  -p 8090:8090 \
  --mount type=bind,src=./mgob/config,dst=/config \
  --mount type=bind,src=./mgob/storage,dst=/storage \
  -e backup__TARGET_URI=mongodb://user:[email protected]:27017,192.168.124.140:27027,192.168.124.139:27037/mongo_staging?replicaSet=mongo_staging_rs \
  -e backup__S3_BUCKET=backups \
  -e backup__S3_ACCESS_KEY=xxx \
  -e backup__S3_SECRET_KEY=xxx \
  --constraint node.labels.mongo.replica==1 \
  maxisam/mgob:1.10.14-aws -LogLevel=info

Config file:

# mgob/config/backup.yml
# 
scheduler:
  cron: "*/2 * * * *" # run every two minutes for testing
  retention: 14 # Retains 14 local backups
  timeout: 60 # Operation timeout: 60 minutes

target:
  host: "192.168.124.111,192.168.124.140"
  database: "mongo_staging"
  port: 27017
  # params: "--ssl --authenticationDatabase admin" # Additional mongodump params, leave blank if not needed
  noGzip: false # Disable gzip compression (false means compression is enabled)

s3:
  # accessKey and secretKey are optional for AWS, if your Docker image has awscli
  accessKey:
  secretKey:
  api: "S3v4"

Here are the logs:

[vagrant@manager1 ~]$ docker service logs -f mgob
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:13:54Z" level=info msg="mgob 1.10.14"
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:13:54Z" level=info msg="starting with config: &{LogLevel:info JSONLog:false Host: Port:8090 ConfigPath:/config StoragePath:/storage TmpPath:/tmp DataPath:/data Version:1.10.14 UseAwsCli:false HasGpg:false}"
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:13:54Z" level=info msg="mongodump version: 100.6.1 git version: 6d9d341edd33b892a2ded7bae529b0e2a96aae01 Go version: go1.19.3    os: linux    arch: amd64    compiler: gc "
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:13:54Z" level=info msg="Minio Client is disabled."
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:13:55Z" level=info msg="aws-cli/1.22.46 Python/3.9.16 Linux/6.5.6-300.fc39.x86_64 botocore/1.23.46 "
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:13:55Z" level=info msg="gpg (GnuPG) 2.2.31 libgcrypt 1.9.4 Copyright (C) 2021 Free Software Foundation, Inc. License GNU GPL-3.0-or-later <https://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.  Home: /root/.gnupg Supported algorithms: Pubkey: RSA, ELG, DSA, ECDH, ECDSA, EDDSA Cipher: IDEA, 3DES, CAST5, BLOWFISH, AES, AES192, AES256, TWOFISH,         CAMELLIA128, CAMELLIA192, CAMELLIA256 Hash: SHA1, RIPEMD160, SHA256, SHA384, SHA512, SHA224 Compression: Uncompressed, ZIP, ZLIB, BZIP2 "
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:13:55Z" level=info msg="Google Storage is disabled."
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:13:55Z" level=info msg="Azure Storage is disabled."
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:13:55Z" level=info msg="RClone is disabled."
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:13:55Z" level=info msg="Next tmp cleanup run at 2024-05-08 13:00:00 +0000 UTC"
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:13:55Z" level=info msg="New job found, saving to store" plan=backup
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:13:55Z" level=info msg="Next run at 2024-05-08 12:14:00 +0000 UTC" plan=backup
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:13:55Z" level=info msg="starting http server on port 8090"
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:14:00Z" level=info msg="Backup started" plan=backup


mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:16:00Z" level=info msg="Backup started" plan=backup
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:18:00Z" level=info msg="Backup started" plan=backup
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:20:00Z" level=info msg="Backup started" plan=backup
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:22:00Z" level=info msg="Backup started" plan=backup
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:24:00Z" level=info msg="Backup started" plan=backup
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:26:00Z" level=info msg="Backup started" plan=backup

I see nothing in /storage nor any errors. I don't receive any mongodb requests to my mongodb service.

When I docker exec -it 23a bash into the mgob container and run a mongodump --uri=<uri>, that works. Also, going to the url 192.168.124.111:8090/backup/backup, I'd expect a request to run an ondemand backup - which also does not happen. I receive a 405 with nothing in the logs.

Is this the right image? maxisam/mgob:1.10.14-aws, I am using mongodb version 5.

madhums avatar May 08 '24 12:05 madhums

Ok now I get these but no clue why it is failing!!

mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:28:00Z" level=info msg="new dump" archive= err="after 3 retries, mongodump log : exit status 1" mlog=
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:28:00Z" level=error msg="BACKUP FAILED: after 3 retries, mongodump log : exit status 1" plan=backup
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:28:00Z" level=info msg="Next run at 2024-05-08 12:30:00 +0000 UTC" plan=backup
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:30:00Z" level=info msg="Backup started" plan=backup
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:30:00Z" level=info msg="new dump" archive= err="after 3 retries, mongodump log : exit status 1" mlog=
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:30:00Z" level=error msg="BACKUP FAILED: after 3 retries, mongodump log : exit status 1" plan=backup
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:30:00Z" level=info msg="Next run at 2024-05-08 12:32:00 +0000 UTC" plan=backup
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:32:00Z" level=info msg="Backup started" plan=backup
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:32:00Z" level=info msg="new dump" archive= err="after 3 retries, mongodump log : exit status 1" mlog=
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:32:00Z" level=error msg="BACKUP FAILED: after 3 retries, mongodump log : exit status 1" plan=backup
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:32:00Z" level=info msg="Next run at 2024-05-08 12:34:00 +0000 UTC" plan=backup

What can I do?

madhums avatar May 08 '24 12:05 madhums

Trying maxisam/mgob:2.0.20-aws but same issue.

madhums avatar May 08 '24 12:05 madhums

I got it to work with this config

# mgob/config/backup.yml
# 
scheduler:
  cron: "*/2 * * * *" # run every day at 6:00 and 18:00 UTC
  retention: 14 # Retains 14 local backups
  timeout: 60 # Operation timeout: 60 minutes

target:
  uri:
  noGzip: false # Disable gzip compression (false means compression is enabled)

retry:
  attempts: 3 # number of retries
  backoffFactor: 60 # backoff factor  * (2 ^ attemptCount) seconds

validation:
  database:
    uri:
    noGzip: false

s3:
  url:
  bucket:
  accessKey:
  secretKey:
  api: "S3v4"

and changing the environment variables like so

docker service create --name mgob \
  -p 8090:8090 \
  --mount type=bind,src=./mgob/config,dst=/config \
  --mount type=bind,src=./mgob/storage,dst=/storage \
  -e BACKUP__TARGET_URI=xxx \
  -e BACKUP__VALIDATION_DATABASE_URI=xxx \
  -e BACKUP__S3_BUCKET=xxx \
  -e BACKUP__S3_ACCESSKEY=xxx \
  -e BACKUP__S3_SECRETKEY=xxx \
  --constraint node.labels.mongo.replica==1 \
  maxisam/mgob:2.0.20-aws -LogLevel=debug

Readme uses lowercase for plan names however, the PLAN.md file suggests using all caps. So some discrepency in the docs

madhums avatar May 08 '24 17:05 madhums

I was having some issues with the validation step, so I removed validation block completely and now I get this weird error which I can't figure out

level=error msg="BACKUP FAILED: mc config host for plan backup failed /bin/sh: mc: not found : exit status 127" plan=backup

what is mc?

madhums avatar May 08 '24 17:05 madhums

Can we improve documentation? It looks like there's one more issue #108 where validation isn't that clear and I couldn't find anything in the documentation about this step.

madhums avatar May 08 '24 17:05 madhums

Looks like minioUpload function is being called here

https://github.com/maxisam/mgob/blob/0bf29263228455012e473aa7e61aedc1f62ba782/pkg/backup/s3.go#L89

which is weird because I am using s3 in my config and I am using maxisam/mgob:2.0.20-aws image.

Should mc be included in this image by default?

madhums avatar May 08 '24 17:05 madhums

It looks like after including BACKUP__S3_URL environment variable, s3Upload is called and we don't have the mc not found error anymore.

However, now we get

level=error msg="BACKUP FAILED: S3 uploading /tmp/backup-1715193000.gz to backup/backup-mongo-staging-backups failed : exit status 1" plan=backup

for

docker service create --name mgob \
  -p 8090:8090 \
  --mount type=bind,src=./mgob/config,dst=/config \
  --mount type=bind,src=./mgob/storage,dst=/storage \
  -e BACKUP__TARGET_URI=xxx \
  -e BACKUP__S3_BUCKET=xxx \
  -e BACKUP__S3_ACCESSKEY=xxx \
  -e BACKUP__S3_SECRETKEY=xxx \
  -e BACKUP__S3_URL=https://s3.amazonaws.com \
  --constraint node.labels.mongo.replica==1 \
  maxisam/mgob:2.0.20-aws -LogLevel=debug

is the url supposed to be https://s3.amazonaws.com? and can the error be any more verbose than "exit status 1"??

madhums avatar May 08 '24 18:05 madhums

I believe two things can help me:

  • [ ] Add some documentation about what the validation step is and how it's helpful
  • [ ] Improve error handling and reporting. Most of the times I only get "exit status 1". What and why is not clear. In case it's failure from aws authentication perspective, those errors must be propagated and conveyed properly.

madhums avatar May 09 '24 06:05 madhums

Digging further, the error seems to be

An error occurred (AccessDenied) when calling the CreateMultipartUpload operation: Access Denied`

When I try to execute aws s3 cp /tmp/planxxx.gz s3://bucket_name/ I get the above.

This is an issue with the configured AWS S3 bucket policy. However, in mgob, I get the output to be simply exit status 1. It would be nice to show the actual error message.

madhums avatar May 25 '24 11:05 madhums

Thanks for all your feedback. I will try to get to them when I have time. Welcome to make a PR as well. validation step is just for testing the backup is correct or not, it basically just restore the backup to a temporary collection if I remember correctly.

You should check the test file and how it works from the Github Action. There are integration tests, which means it works for real environment.

MC is https://min.io/docs/minio/linux/reference/minio-mc.html

maxisam avatar Jun 01 '24 16:06 maxisam