Gaffer icon indicating copy to clipboard operation
Gaffer copied to clipboard

Possible RemoveGraph bug wrt Accumulo tables & changing schemas

Open n3101 opened this issue 3 years ago • 4 comments

From a non-github user:

"If I execute RemoveGraph, then by default the underlying Accumulo table will remain. If I then execute AddGraph using the existing table name then happy days, I can re-access my data. We've been using this mechanism to make subtle changes to the schema. Obviously needs a certain level of compatibility or bad things will happen but e.g. for additive changes this approach works well.

Turns out, however, there's a problem with this approach. In this scenario the original schema remains set as Accumulo configuration properties and is hence what is enforced e.g. at compaction. This can then vary from the schema that's being applied on ingest (via the API). I can't understand how that is a good place to be, so I consider this a bug, but is it actually intended behaviour? It seems sufficiently bad to diverge schemas like this I'd suggest one of two alternatives: A) when you re-add the graph, and Gaffer notices there's an existing Accumulo table there, it compares the schema and gets grumpy with you if it's not the same as that set on the existing table. B) when you re-add the graph, and Gaffer notices there's an existing Accumulo table there, it updates the version of the schema held in it's config properties (exactly the same as it would have set it in the first place if it was a new graph). I'd strongly favour B). Any thoughts?"

n3101 avatar Nov 10 '21 10:11 n3101

@d21211122 suggested: "... longer term, I'd prefer it if we didn't allow the update at all, and provided a specific 'update schema' operation for this which does all the compatibility checking for you... and then we can make 'remove graph' actually remove the graph..."

n3101 avatar Nov 10 '21 11:11 n3101

We'll do the suggested provide "... a specific 'update schema' operation for this which does all the compatibility checking for you... and then we can make 'remove graph' actually remove the graph..."

See also #2352-#2356

n3101 avatar Feb 02 '22 16:02 n3101

The motivation behind all this is a need to be able to mark records for deletion. The process above is a scary way of going about this. Instead we will look at a better solution to that problem before deciding how to proceed here.

n3101 avatar Mar 22 '22 14:03 n3101

The motivation behind all this is a need to be able to mark records for deletion. The process above is a scary way of going about this. Instead we will look at a better solution to that problem before deciding how to proceed here.

See #3004 for where this deletion concept is explored further.

GCHQDeveloper314 avatar Mar 15 '24 17:03 GCHQDeveloper314