Gaffer icon indicating copy to clipboard operation
Gaffer copied to clipboard

Provide the ability to remove properties from graph

Open javadev001001 opened this issue 6 years ago • 6 comments

There should be a way of removing a property which has been used in a Gaffer graph and has some data stored when it is no longer required.

javadev001001 avatar Sep 28 '17 12:09 javadev001001

A workaround for the time being is to rename the property in the schema to "empty" (or something similar) then tell Gaffer to serialise/deserialise it with a NullSerialiser (see below).

This way the property would never be added to the Properties map on an element - so it would never be seen by a user.

public class NullSerialiser extends ToBytesSerialiser<Object> {
    @Override
    public boolean canHandle(final Class clazz) {
        return true;
    }

    @Override
    public byte[] serialise(final Object value) throws SerialisationException {
        return new byte[0];
    }

    @Override
    public Object deserialise(final byte[] bytes) throws SerialisationException {
        return null;
    }

    @Override
    public Object deserialiseEmpty() {
        return null;
    }

    @Override
    public boolean preservesObjectOrdering() {
        return true;
    }

    @Override
    public boolean isConsistent() {
        return true;
    }
}

p013570 avatar Sep 28 '17 12:09 p013570

It depends what "remove the property" means. If it means that the property should never be returned from a query then the above approach works. If it means that the property should actually be removed from the store, then the above won't work, e.g. in the AccumuloStore key-values may not be deserialised and reserialised in a major compaction (if there is a validation filter then they will be deserialised but not reserialised) so the values of the property will persist in the store indefinitely. To deal with this a new operation would be required.

@javadev001001 Can you confirm whether the requirements is for the first case, the second or both?

gaffer01 avatar Sep 29 '17 08:09 gaffer01

@gaffer01 yes the ticket is to actually remove the property from everywhere.

My workaround above is just an idea to help in the meantime. It would stop the property being returned in queries, but yes it may remain in the store.

p013570 avatar Oct 02 '17 12:10 p013570

Removing a property from elements in Accumulo is hard and could take a lot of computation. There are a few options:

  • Inspect the R files directly and remove the bytes.
  • MapReduce: load the entire table, transform the elements and write the elements back out into R files. Create a new table and load the R files back in.
  • Write a custom Accumulo Iterator that can modify the element properties. This would only work on non-group properties.
  • Reuse the property for something else: rename it in the schema and create a custom serialiser that can handle both the old type and the new type. When the bytes are read out of accumulo the serialiser could determine whether it is the old type and if it is set it to null, otherwise just use the new value.

p013570 avatar Oct 05 '17 16:10 p013570

This is linked to #128. Ideally we would create a generic operation that could be applied to any Gaffer store. Then each store can choose if and how to implement the migration.

p013570 avatar Oct 05 '17 16:10 p013570

OK, how about we add the Operation: "RemoveProperties" and implement it just for the MapStore. Other Stores have implement it separately.

The operation could have these 2 fields:

/**
 * A Map of entity group names to properties to be removed.
 */
private Map<String, List<String>> entityProperties;

/**
 * A Map of edge group names to properties to be removed.
 */
private Map<String, List<String>> edgeProperties;

We should start by validating that the schema actually contains the properties. The store should fully delete the properties and update all references to the schema accordingly including the reference in the GraphLibrary.

p013570 avatar Nov 23 '17 16:11 p013570