HIVE-25790 Make managed table copies handle updates (FileUtils)
What changes were proposed in this pull request?
Changed FileUtils.copy() to skip identical files on the destination directory to improve copy performance. FileUtils.copy() originally just removed and recreated the destination directory. This change makes it compare each file and directory, and delete only different files and directories.
Why are the changes needed?
In an optimized replication bootstrap scenario, it copies many files from source to destination. It can copy thousands of files. If it fails during copying process, it retries. Then it has some files already copied, but its implementation removes them and copy all of them entirely. It should skip the already copied ones.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
It introduced few JUnit test scenarios in TestFileUtils and TestCopyUtils. It will be tested with automated regression test suites on the test server.
I found a couple of timeouts on the previous run. I found that CopyUtils.leaveIdenticalFilesOnly should compare the source path with its corresponding path under the destination path, but it actually compared the source path with the destination path. So I fixed it.
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the [email protected] list if the patch is in need of reviews.








