cloudera-scripts-for-log4j icon indicating copy to clipboard operation
cloudera-scripts-for-log4j copied to clipboard

No validation on backup files - roll back not possible

Open starkjs opened this issue 4 years ago • 6 comments

There is no validation on backup files. I have a case where the backup path filled up from the script and a number of jar files didn't get backed up, but did get modified. This means there is no rollback - very bad

if [ ! -f "$targetbackup" ]; then
  echo "Backing up to '$targetbackup'"
  cp -f "$jarfile" "$targetbackup"
fi

starkjs avatar Dec 15 '21 11:12 starkjs

Thanks for the report. We are looking into a fix.

jtran-cloudera avatar Dec 15 '21 23:12 jtran-cloudera

No worries @jtran-cloudera

I will submit a PR today, I have a number of fixes. My clients raised cases with cloudera too. So it’s in the notes for those cases

starkjs avatar Dec 15 '21 23:12 starkjs

Hi @jtran-cloudera, I am not able to send you my code via public as it's IP. I have feed back the code changes via our Cloudera Consultant and he will pass it back via the Cloudera Case my client has open. Thanks Josh

starkjs avatar Dec 16 '21 22:12 starkjs

Hi @jtran-cloudera, @sdevineni, I see you added the code to validate the backup file, but it's only on jar files, it's also needed on every backup file, like the tar.gz, nar and the new uberjar code.

I see you also added https://github.com/cloudera/cloudera-scripts-for-log4j/blob/ce8dfbe6e2a2e899306726acd5767668e2b24d23/cm_cdp_cdh_log4j_jndi_removal.sh#L119 when the code doesn't match the backup, I think that is a bad idea, as it will exit the entire script at that point.

Thanks Josh

starkjs avatar Dec 19 '21 11:12 starkjs

Yes, we are working to update this for nar files as well.

if backup fails, it could be because of permissions or space elated issues. hence a fail-fast methodology is adopted to figure our the reason behind the backup creation.

sunilgovind avatar Dec 19 '21 16:12 sunilgovind

Hi @sunilgovind,

Sounds good. I have already added the sha checksum to the tar.gz and nar too

I disagree, from the point of view of automation, I don't want the script to die, it should report issues, not action in those cases and move on. When you have to work on 100's and 1000's of servers to run the patch, you don't have time to stop and debug on Production. All testing needs to be done in NonProd and get all the issue sorted before running in Production

Thanks Josh

starkjs avatar Dec 19 '21 22:12 starkjs