Scanning a table without the versioning iterator can drop keys
I have conclusively proved that if you scan a table without a versioning iterator, and that table contains identical keys but different values, keys will be dropped. I played with using batch scanners and single scanners, and I played with varying buffer sizes and the symptoms where the same; keys would be lost. I had to go to directly reading the rfile to be able to see all of the keys I needed for processing. None of the keys have the delete flag set.
Accumulo 2.1.4 Redhat 8
I have an example of a table with only 1 file that demonstrates this issue. I have not attempted to create a test example as of yet.
I expect that a scan of a table without any iterators or any delete keys would be equivalent to a direct scan of the rfiles.
I have noted that in the example I have, if I scan the separate rows directly then it is less likely to drop keys. However if I do a full scan of the table that I am more likely to drop keys.
Here is a script that will create a table without and versioning iterator configured:
#!/bin/sh echo "createtable test" echo "config -t test -d table.iterator.majc.vers" echo "config -t test -d table.iterator.minc.vers" echo "config -t test -d table.iterator.scan.vers" for i in {1..100}; do for j in {1..10}; do declare -i vsize=$(( RANDOM % 2000 )) value=
tr -dc 'A-Za-z0-9' < /dev/urandom | head -c $vsize; echoecho "insert row$i cf$i cq$i "$value" -ts 100 -t test" done done echo "flush test -w" echo "compact test -w"
Pipe the output of this script in to a file and then use the shell to "execfile" that file. The result should be a test table that contains exactly 1000 entries and indeed if you dump the contents of the rfile you will see all of them. HOWEVER if you do a full scan of the table using the shell or a scanner (scan -t test -np) you will find that you do not get all of the keys.
Here is a failing JUnit test case too:
@Test
public void testDuplicateTimestampScanLosesKeys() throws Exception {
ClientContext context = (ClientContext) client;
final int numRows = 100;
final int mutationsPerRow = 10;
final int expectedEntries = numRows * mutationsPerRow;
SecureRandom random = new SecureRandom();
byte[] randomValue = new byte[8192];
client.tableOperations().create(tableName);
Set<String> versionIterProps =
Set.of("table.iterator.scan.vers", "table.iterator.minc.vers", "table.iterator.majc.vers");
client.tableOperations().modifyProperties(tableName,
properties -> properties.keySet().removeAll(versionIterProps));
try (BatchWriter bw = client.createBatchWriter(tableName)) {
for (int i = 0; i < numRows; i++) {
for (int j = 0; j < mutationsPerRow; j++) {
Mutation m = new Mutation("row" + i);
random.nextBytes(randomValue);
m.put("cf" + i, "cq" + i, 100L, new Value(randomValue));
bw.addMutation(m);
}
}
}
client.tableOperations().flush(tableName, null, null, true);
client.tableOperations().compact(tableName, new CompactionConfig().setWait(true));
client.tableOperations().offline(tableName, true);
long offlineCount;
try (OfflineScanner offlineScanner =
new OfflineScanner(context, context.getTableId(tableName), Authorizations.EMPTY)) {
offlineCount = offlineScanner.stream().count();
}
client.tableOperations().online(tableName, true);
long onlineCount;
try (Scanner scanner = client.createScanner(tableName, Authorizations.EMPTY)) {
onlineCount = scanner.stream().count();
}
assertEquals(expectedEntries, offlineCount);
assertEquals(offlineCount, onlineCount, "Online scan lost keys compared to direct RFile scan");
}
And this fails on the final assert with:
org.opentest4j.AssertionFailedError: Online scan lost keys ==>
Expected :1000
Actual :888
From what I have gathered I think this is what is going on here:
When a server cuts a scan batch, it remembers the last key it returns to the client. When it resumes the scan, it exclusive-seeks to that key it remembered. MemKey is used in the scan and has a tie breaker for the keys which is the kvCount. That is not remembered by the server though so when it resumes, all the mutations with the "same" key are skipped.