accumulo icon indicating copy to clipboard operation
accumulo copied to clipboard

Update export version to handle imports of files without fenced ranges

Open EdColeman opened this issue 1 year ago • 5 comments

Exports created with versions before 3.1 will not contain range fenced information. This change allows imports of exported with Accumulo versions prior to 3.1 to be imported into 3.1+.

  • updated export version to 2 as an optimization so files exported with 3.1+ can be assumed to include fenced ranges.
  • moved the need conversion check to StoredTabletFile for reuse. (also moves tests)

Also verified using uno with exports created with versions 2.1.3 and 3.1 and imported into 3.1 (with these changes)

Closes #3849

EdColeman avatar Dec 29 '23 16:12 EdColeman

The testing difficulty is that the files need to be written in a prior version, I suppose that the files could be created using 2.1 and added as a resource and preserved in Github. I'm not aware of other places where we have done that. Or maybe there is another way?

The changes were testing using uno, but I do agree that is less than satisfying.

EdColeman avatar Jan 20 '24 18:01 EdColeman

Yeah, in this case I think it might be ok to just create small files with 2.1 and commit it as part of the test resources. I have done that before in other projects and I don't think it's a problem here. The only other option is to write code to manually write out the test files each time in the old format which doesn't sound like fun. Since we are just checking if the legacy conversion works seems like just uploading an old file format would be fine. @ctubbsii - Do you have any issue if we uploaded test files created by an older version to check the conversion?

cshannon avatar Jan 20 '24 18:01 cshannon

Yeah, in this case I think it might be ok to just create small files with 2.1 and commit it as part of the test resources.

We already have some data like that for testing old rfile formats. So there is precedent for that pattern and I think it would be a good thing to do. Could place a comment in the test showing how the data was produced.

keith-turner avatar Jan 20 '24 19:01 keith-turner

Import test case added in 47f634d3eecae78a. Includes README in the data dir to create the files with shell commands.

EdColeman avatar Jan 21 '24 02:01 EdColeman

One quirk of adding the test files where they currently are, the license check shows a warning because it does not know how to check a *.rf file. (The build completes though) Unsure if a general exclusion for all rfiles, or something more targeted to the file / directory location would be appropiate. @ctubbsii do you have a suggestion or preference?

Another curiosity is that the licence check inserted the ASL into the export distcp.txt file, but it does not seem to matter to the import code.

The maven warnings:

[INFO] --- license:4.3:format (license-headers) @ accumulo-test ---
[INFO] Updating license headers...
[WARNING] Unknown file extension: .../test/src/main/resources/v2_import_test/data/A0000008.rf
[WARNING] Unknown file extension: .../test/src/main/resources/v2_import_test/data/A000000a.rf
[WARNING] Unknown file extension: .../test/src/main/resources/v2_import_test/data/A000000b.rf
[WARNING] Unknown file extension: .../test/src/main/resources/v2_import_test/data/A0000009.rf
[WARNING] Unable to find a comment style definition for some files. You may want to add a custom mapping for the relevant file extensions.

EdColeman avatar Jan 22 '24 22:01 EdColeman

Unsure if a general exclusion for all rfiles, or something more targeted to the file / directory location would be appropiate.

Either are fine. I can't imagine we'd ever have a file ending in .rf where we'd want the plugin to add a license header to it, so can probably just exclude them all **/*.rf in the parent POM (and remove any more specific .rf exclusions from any child POMs... I know at least one exists in core/pom.xml).

Another curiosity is that the licence check inserted the ASL into the export distcp.txt file, but it does not seem to matter to the import code.

That one should also be excluded. The format of that file should be one file per line, and the comments it added aren't clearly identifiable as a comment, so we should just avoid it. It also doesn't matter, because there's no copyrightable content in that file anyway. It's just a list of files.

ctubbsii avatar Jan 25 '24 23:01 ctubbsii