efs-backup icon indicating copy to clipboard operation
efs-backup copied to clipboard

Incomplete backups

Open Resisty opened this issue 7 years ago • 7 comments
trafficstars

Hello,

I see that this issue has been closed but I'm experiencing a very similar problem.

I had deployed this solution last week (2018-08-09) and it initially ran successfully, but now reports incomplete backups:

{ "BackupId": "9d74c8a7", "BackupPrefix": "/", "BackupStartTime": "2018-08-13T16:08:17", "BackupStatus": "Incomplete", "BackupStopTime": "2018-08-13T16:19:10", "BackupWindow": "180", "CreateHardlinksStartTime": "2018-08-13T16:11:12", "CreateHardlinksStopTime": "2018-08-13T16:14:39", "DestinationEfsId": "fs-387c3591", "DestinationEfsSize": 54811144192, "DestinationPerformanceMode": "maxIO", "EC2Logs": "https://s3.amazonaws.com/nexus-efs-backup-efslogbucket-1uwzom8ipy8y0/ec2-logs/efs-backup-backup-20180813-1619.log", "ExpireItem": "1541952496", "InstanceType": "c5.xlarge", "IntervalTag": "daily", "Message": "The EFS backup was incomplete. The backup window expired before the full backup was completed.", "NumberOfFiles": 41948, "NumberOfFilesTransferred": 1785, "RemoveSnapshotStartTime": "2018-08-13T16:10:16", "RemoveSnapshotStopTime": "2018-08-13T16:11:12", "RetainPeriod": "7", "S3BucketSize": 7380943, "SourceBurstCreditBalance": 2308974418330, "SourceBurstCreditBalancePostBackup": 2308974418330, "SourceEfsId": "fs-35aae49c", "SourceEfsSize": 59084480512, "SourcePerformanceMode": "generalPurpose", "SourcePermittedThroughput": 104857600, "TotalFileSize": 60585862733, "TotalTransferredFileSize": 8997712481 } I have the backup window set to 6 hours, although the scripts only run for a couple minutes, which seems to contradict the error message.

SSM stderr:

% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed

0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 100 10 100 10 0 0 16835 0 --:--:-- --:--:-- --:--:-- 10000 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed

0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 100 19 100 19 0 0 35514 0 --:--:-- --:--:-- --:--:-- 19000 kill: sending signal to 15243 failed: No such process SSM stdout:

-- 2018-08-13T16:19:08 -- uploading cloud init logs Completed 75.6 KiB/75.6 KiB (196.1 KiB/s) with 1 file(s) remaining upload: var/log/cloud-init-output.log to s3://nexus-efs-backup-efslogbucket-1uwzom8ipy8y0/ec2-logs/efs-backup-backup-20180813-1619.log -- 2018-08-13T16:19:09 -- upload ec2 cloud init logs to S3, status: 0 -- 2018-08-13T16:19:09 -- uploading backup (fpsync) logs Completed 256.0 KiB/314.1 KiB (689.4 KiB/s) with 1 file(s) remaining Completed 314.1 KiB/314.1 KiB (745.1 KiB/s) with 1 file(s) remaining upload: tmp/efs-backup.log to s3://nexus-efs-backup-efslogbucket-1uwzom8ipy8y0/efs-backup-logs/efs-backup-backup-fpsync-20180813-1619.log -- 2018-08-13T16:19:10 -- upload backup fpsync logs to S3 status: 0 -- 2018-08-13T16:19:10 -- uploading backup (rsync delete) logs Completed 139 Bytes/139 Bytes (367 Bytes/s) with 1 file(s) remaining upload: tmp/efs-backup-rsync.log to s3://nexus-efs-backup-efslogbucket-1uwzom8ipy8y0/efs-backup-logs/efs-backup-backup-rsync-delete-20180813-1619.log -- 2018-08-13T16:19:10 -- upload rsync delete logs to S3 status: 0 -- 2018-08-13T16:19:10 -- fpsync foreground process-id: 15243 -- 2018-08-13T16:19:10 -- kill with SIGTERM, status: 0 -- 2018-08-13T16:19:10 -- exiting loop -- 2018-08-13T16:19:10 -- Number of files: 41948 -- 2018-08-13T16:19:10 -- Number of files transferred: 1785 -- 2018-08-13T16:19:10 -- Total file size: 60585862733 -- 2018-08-13T16:19:10 -- Total transferred file size: 8997712481 -- 2018-08-13T16:19:10 -- source efs BurstCreditBalance after backup: 2.30897441833e+12 fpsyncStatus: 1 rsync delete status: 0 -- 2018-08-13T16:19:10 -- backup finish time: 2018-08-13T16:19:10 -- 2018-08-13T16:19:10 -- backup incomplete (id: 9d74c8a7) -- 2018-08-13T16:19:11 -- dynamo db update status: 0 -- 2018-08-13T16:19:11 -- updating lifecycle hook -- 2018-08-13T16:19:11 -- lifecycle hook update status: 0

efs-backup-backup-20180813-1619.log

EC2 log attached. Any advice or tips would be much appreciated, thanks!

Resisty avatar Aug 13 '18 17:08 Resisty

Hello, I can see in the EC2 logs fpsync finished with status 1.

-- 2018-08-13T16:14:39 -- sudo "PATH=/sbin:/usr/sbin:/bin:/usr/bin:/usr/local/bin" /usr/local/bin/fpsync -n 64 -o "-a --stats --numeric-ids --log-file=/tmp/efs-backup.log" /backup/ /mnt/backups/efs-backup/daily.0/ which: no mail in (/sbin:/usr/sbin:/bin:/usr/bin:/usr/local/bin) fpsyncStatus:1

I am not sure why, but the reason for failed backup is that fpsync couldn't run successfully. We upload backup-fpsync logs to s3 as well. Do you have those, may be we can get more information there.

As a troubleshooting step, I would suggest to replicate the script manually. Try running the same command on EC2 box as the script runs and see the output: sudo "PATH=$PATH" /usr/local/bin/fpsync -n $_thread_count -v -o "-a --stats --numeric-ids --log-file=/tmp/efs-backup.log" /backup/ /mnt/backups/$efsid/$interval.0/ 1>/tmp/efs-fpsync.log fpsyncStatus=$?

gsingh04 avatar Aug 16 '18 05:08 gsingh04

Also, can you please tell us if it is reporting incomplete backup everyday or it is intermittent issue.

gsingh04 avatar Aug 16 '18 05:08 gsingh04

It's completed successfully the last two days (of course), so it seems to be intermittent.

Here's the efs-backup log from the same run as above. efs-backup-backup-fpsync-20180813-1619.log

Resisty avatar Aug 16 '18 16:08 Resisty

If it completed then those backups are 'good'. I believe we need to change the following line to have both stdout+stderr added to the log; currently, backup-fpsync only has stdout, which is not helpful. https://github.com/awslabs/efs-backup/blob/255df693bde42d3e4a3114b9c9c05b1dd35b7144/source/scripts/efs-backup-fpsync.sh#L118 change to:sudo "PATH=$PATH" /usr/local/bin/fpsync -n $_thread_count -v -o "-a --stats --numeric-ids --log-file=/tmp/efs-backup.log" /backup/ /mnt/backups/$efsid/$interval.0/ &>/tmp/efs-fpsync.log

Additionally, we need to fix the notification that tells you window expired when that was not the case rather fpsync failure was the issue.

We will address these with next iteration of the solution.

gsingh04 avatar Aug 16 '18 17:08 gsingh04

Do you have a timeframe for the next version? I am also running into this issue on every 1 in 3 or so backup runs.

longwave avatar Nov 20 '18 09:11 longwave

We have released the next iteration with a fix to the customer notification. We have highlighted the fpsync issue in our solution implementation guide and put recommendations around splitting the workload.

gsingh04 avatar Feb 05 '19 17:02 gsingh04

AWS Backup a fully managed backup service, now enables you to centrally manage backups for Amazon EFS file systems. We recommend that you evaluate AWS Backup for your specific use case before you use this solution.

gsingh04 avatar Feb 06 '19 16:02 gsingh04