efs-backup
efs-backup copied to clipboard
Incomplete backups
Hello,
I see that this issue has been closed but I'm experiencing a very similar problem.
I had deployed this solution last week (2018-08-09) and it initially ran successfully, but now reports incomplete backups:
{ "BackupId": "9d74c8a7", "BackupPrefix": "/", "BackupStartTime": "2018-08-13T16:08:17", "BackupStatus": "Incomplete", "BackupStopTime": "2018-08-13T16:19:10", "BackupWindow": "180", "CreateHardlinksStartTime": "2018-08-13T16:11:12", "CreateHardlinksStopTime": "2018-08-13T16:14:39", "DestinationEfsId": "fs-387c3591", "DestinationEfsSize": 54811144192, "DestinationPerformanceMode": "maxIO", "EC2Logs": "https://s3.amazonaws.com/nexus-efs-backup-efslogbucket-1uwzom8ipy8y0/ec2-logs/efs-backup-backup-20180813-1619.log", "ExpireItem": "1541952496", "InstanceType": "c5.xlarge", "IntervalTag": "daily", "Message": "The EFS backup was incomplete. The backup window expired before the full backup was completed.", "NumberOfFiles": 41948, "NumberOfFilesTransferred": 1785, "RemoveSnapshotStartTime": "2018-08-13T16:10:16", "RemoveSnapshotStopTime": "2018-08-13T16:11:12", "RetainPeriod": "7", "S3BucketSize": 7380943, "SourceBurstCreditBalance": 2308974418330, "SourceBurstCreditBalancePostBackup": 2308974418330, "SourceEfsId": "fs-35aae49c", "SourceEfsSize": 59084480512, "SourcePerformanceMode": "generalPurpose", "SourcePermittedThroughput": 104857600, "TotalFileSize": 60585862733, "TotalTransferredFileSize": 8997712481 } I have the backup window set to 6 hours, although the scripts only run for a couple minutes, which seems to contradict the error message.
SSM stderr:
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 100 10 100 10 0 0 16835 0 --:--:-- --:--:-- --:--:-- 10000 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 100 19 100 19 0 0 35514 0 --:--:-- --:--:-- --:--:-- 19000 kill: sending signal to 15243 failed: No such process SSM stdout:
-- 2018-08-13T16:19:08 -- uploading cloud init logs Completed 75.6 KiB/75.6 KiB (196.1 KiB/s) with 1 file(s) remaining upload: var/log/cloud-init-output.log to s3://nexus-efs-backup-efslogbucket-1uwzom8ipy8y0/ec2-logs/efs-backup-backup-20180813-1619.log -- 2018-08-13T16:19:09 -- upload ec2 cloud init logs to S3, status: 0 -- 2018-08-13T16:19:09 -- uploading backup (fpsync) logs Completed 256.0 KiB/314.1 KiB (689.4 KiB/s) with 1 file(s) remaining Completed 314.1 KiB/314.1 KiB (745.1 KiB/s) with 1 file(s) remaining upload: tmp/efs-backup.log to s3://nexus-efs-backup-efslogbucket-1uwzom8ipy8y0/efs-backup-logs/efs-backup-backup-fpsync-20180813-1619.log -- 2018-08-13T16:19:10 -- upload backup fpsync logs to S3 status: 0 -- 2018-08-13T16:19:10 -- uploading backup (rsync delete) logs Completed 139 Bytes/139 Bytes (367 Bytes/s) with 1 file(s) remaining upload: tmp/efs-backup-rsync.log to s3://nexus-efs-backup-efslogbucket-1uwzom8ipy8y0/efs-backup-logs/efs-backup-backup-rsync-delete-20180813-1619.log -- 2018-08-13T16:19:10 -- upload rsync delete logs to S3 status: 0 -- 2018-08-13T16:19:10 -- fpsync foreground process-id: 15243 -- 2018-08-13T16:19:10 -- kill with SIGTERM, status: 0 -- 2018-08-13T16:19:10 -- exiting loop -- 2018-08-13T16:19:10 -- Number of files: 41948 -- 2018-08-13T16:19:10 -- Number of files transferred: 1785 -- 2018-08-13T16:19:10 -- Total file size: 60585862733 -- 2018-08-13T16:19:10 -- Total transferred file size: 8997712481 -- 2018-08-13T16:19:10 -- source efs BurstCreditBalance after backup: 2.30897441833e+12 fpsyncStatus: 1 rsync delete status: 0 -- 2018-08-13T16:19:10 -- backup finish time: 2018-08-13T16:19:10 -- 2018-08-13T16:19:10 -- backup incomplete (id: 9d74c8a7) -- 2018-08-13T16:19:11 -- dynamo db update status: 0 -- 2018-08-13T16:19:11 -- updating lifecycle hook -- 2018-08-13T16:19:11 -- lifecycle hook update status: 0
efs-backup-backup-20180813-1619.log
EC2 log attached. Any advice or tips would be much appreciated, thanks!
Hello, I can see in the EC2 logs fpsync finished with status 1.
-- 2018-08-13T16:14:39 -- sudo "PATH=/sbin:/usr/sbin:/bin:/usr/bin:/usr/local/bin" /usr/local/bin/fpsync -n 64 -o "-a --stats --numeric-ids --log-file=/tmp/efs-backup.log" /backup/ /mnt/backups/efs-backup/daily.0/ which: no mail in (/sbin:/usr/sbin:/bin:/usr/bin:/usr/local/bin) fpsyncStatus:1
I am not sure why, but the reason for failed backup is that fpsync couldn't run successfully. We upload backup-fpsync logs to s3 as well. Do you have those, may be we can get more information there.
As a troubleshooting step, I would suggest to replicate the script manually. Try running the same command on EC2 box as the script runs and see the output:
sudo "PATH=$PATH" /usr/local/bin/fpsync -n $_thread_count -v -o "-a --stats --numeric-ids --log-file=/tmp/efs-backup.log" /backup/ /mnt/backups/$efsid/$interval.0/ 1>/tmp/efs-fpsync.log fpsyncStatus=$?
Also, can you please tell us if it is reporting incomplete backup everyday or it is intermittent issue.
It's completed successfully the last two days (of course), so it seems to be intermittent.
Here's the efs-backup log from the same run as above. efs-backup-backup-fpsync-20180813-1619.log
If it completed then those backups are 'good'. I believe we need to change the following line to have both stdout+stderr added to the log; currently, backup-fpsync only has stdout, which is not helpful.
https://github.com/awslabs/efs-backup/blob/255df693bde42d3e4a3114b9c9c05b1dd35b7144/source/scripts/efs-backup-fpsync.sh#L118
change to:sudo "PATH=$PATH" /usr/local/bin/fpsync -n $_thread_count -v -o "-a --stats --numeric-ids --log-file=/tmp/efs-backup.log" /backup/ /mnt/backups/$efsid/$interval.0/ &>/tmp/efs-fpsync.log
Additionally, we need to fix the notification that tells you window expired when that was not the case rather fpsync failure was the issue.
We will address these with next iteration of the solution.
Do you have a timeframe for the next version? I am also running into this issue on every 1 in 3 or so backup runs.
We have released the next iteration with a fix to the customer notification. We have highlighted the fpsync issue in our solution implementation guide and put recommendations around splitting the workload.
AWS Backup a fully managed backup service, now enables you to centrally manage backups for Amazon EFS file systems. We recommend that you evaluate AWS Backup for your specific use case before you use this solution.