Gaffer
Gaffer copied to clipboard
FederatedStore FederatedGraphStorage has Accumulo specific code
The Accumulo specific code in FederatedGraphStorage fixes mismatch in tables and graph name.
- Does a similar problem effect other stores?
- Does this code belong somewhere else
Possible Likely solution. There should be a mapping of GraphIds to TableID names so map can be updated without needing to rename tables.
link: gh-2435
For reference: https://github.com/gchq/Gaffer/blob/23c323e894c8dba3e25029c3699b6fb5a050a39e/store-implementation/federated-store/src/main/java/uk/gov/gchq/gaffer/federatedstore/FederatedGraphStorage.java#L612-L638
@d21211122 said: "I'm not sure what the solution should be, but I remember going through this problem and we definitely need a solution, and there shouldn't be Accumulo specific code in any of the federated code"
I came across this problem when adding Kerberos support to the Accumulo Store. This Federated Store Accumulo connection needed to be changed so it would not rely on username/password authentication. This was done easily enough by making the Accumulo TableUtil accept the Accumulo Properties object directly.
The solution for removing this Accumulo code from the Federated Store seems to me to be adding an ability to rename graphs stored in Accumulo into the Accumulo Store TableUtil class. Then the Federated Store can just call something like this, instead of having to import any Accumulo libraries itself.
//Update Tables
String storeClass = graphToMove.getStoreProperties().getStoreClass();
if (nonNull(storeClass) && storeClass.startsWith(AccumuloStore.class.getPackage().getName())) {
AccumuloProperties tmpAccumuloProps = (AccumuloProperties) graphToMove.getStoreProperties();
TableUtils.renameTable(tmpAccumuloProps, graphId, newGraphId)
}
TableUtil has a createTable method so a renameTable method would fit in fine.
@GCHQDeveloper314 That would remove the direct Accumulo library usage, but there is still Accumulo store specific code. I think a more generic fix could be to implement a rename graph operation
The way forward does look to be to add a proper way to rename graphs, but this seems to be a little more complex than it appears.
The federated store considers the table name to always be the graphId and graphId is used to refer to the name of the associated Accumulo table. But in the Accumulo store, the tableName is only the same as the graphId when there is no graph namespace:
https://github.com/gchq/Gaffer/blob/2c1ac344070389d95f583a9b661fc1b250f12b23/store-implementation/accumulo-store/src/main/java/uk/gov/gchq/gaffer/accumulostore/AccumuloStore.java#L201-L205
The graphId is set in the store properties, it seems wrong to be changing the table name associated with a graph without changing the properties of that graph to match - with the ChangeGraphId operation this is possible. To resolve this issue requires more understanding of why changing the graphId is useful or required. E.g: If it's required because the store properties changed, then this should be the source of the new name and the functionality to fix this can be moved into the Accumulo store.
@GCHQDev404 Could you explain why/when the ChangeGraphId operation is used?
One example is Users wishing to switch in graphs, but require operations with graphIds to continue to work. Users wishing to update a graph name when multiple version exist (testGraph1, testGraph2, testGraph2Bigger) and they wish to keep one with data and remove the rest.
I haven't written a ticket for this yet, but a likely solution is to have Mapping of "TableNames to GraphIds" which FederatedStore can internally update a graph name graphA in the map without requiring any changes to Accumulo table name 91234810932.
This creates a small headache for Accumulo Admins which have tables with no names
Possible solution is similar to the solution and approach at FederatedStore RemoveGraphDeleteAccumuloTableHandler #2632
The idea for mapping static Accumulo table names to renamable GraphIds is how this should have been done originally. However making this change now would be disruptive and would require migration for any existing Accumulo backed graphs. There are also certain undocumented practices which could stop working if this were adopted.
In particular if table name is mapped to a unique graphId, what should happen to this mapping if the graph is removed. Or if the table no longer exists when should the mapping be cleaned up.
Changing graphId is something which should be handled in the core Store/Graph code, federated store should be able to simply call a method to get the graphId changed. This could also be standard handler applicable to all graphs.
As stated above, this further work can be raised as a new issue.