accumulo icon indicating copy to clipboard operation
accumulo copied to clipboard

Make exportTable command volume aware

Open EdColeman opened this issue 3 years ago • 2 comments

Is your feature request related to a problem? Please describe. When using exportTable command, it does not take into consideration of multiple volumes and creates a single distcp.txt file. The file does not differentiate between volumes and the file must be post-processed to separate the files by volume(s) for feeding to individual distcp commands for each volume when exporting a table that uses more than one volume.

Describe the solution you'd like Have the command generate a single file (distcp) list per volume. The solution should play nice with importTable command. It may also be useful if the exported metadata and the distcp file lists are preserved after the import table process to allow for post-processing validation.

Additional context This is a follow on to #2869, #2871 and #2849 (offline option) which implement work-arounds for exportTable and importTable not being volume aware. Inspired by changes submitted by @drewfarris for importTable to handle multiple directories.

EdColeman avatar Aug 10 '22 21:08 EdColeman

I have logic in place and starting to test the changes now.

AlbertWhitlock avatar Feb 17 '23 16:02 AlbertWhitlock

In discussions on #3228, I've become aware that the term "Volume aware" isn't quite the right terminology for what is needed. What it needs to be is "FileSystem aware". Files need to be grouped by Hadoop FileSystem instances (not just FileSystem types), regardless of how they map to Accumulo Volume instances. The limitation on distcp is copying across Hadoop FileSystems. DistCp has no knowledge of, or concept of, Accumulo Volumes.

ctubbsii avatar Aug 02 '23 08:08 ctubbsii