gradoop
gradoop copied to clipboard
Object reuse leads to failed tests in pattern matching.
I was trying to change the org.gradoop.flink.model.impl.functions.epgm.VertexFromId
class to reuse objects instead of calling the factory in every map-step:
/** Reuse. */
private final EPGMVertex reuseVertex;
/**
* Create new function.
*
* @param vertexFactory EPGM vertex factory
*/
public VertexFromId(VertexFactory<EPGMVertex> vertexFactory) {
reuseVertex = vertexFactory.createVertex();
}
@Override
public EPGMVertex map(Tuple1<GradoopId> gradoopId) throws Exception {
// Before:
//return vertexFactory.initVertex(gradoopId.f0);
reuseVertex.setId(gradoopId.f0);
return reuseVertex;
}
This leads to the following tests failing:
org.gradoop.flink.model.impl.operators.matching.single.preserving.explorative.ExplorativeHomomorphismSetPairBulkTesttestGraphElementIdEqualityfor graphsGraph1_Chain2andGraph3_Unlabeled0
org.gradoop.flink.model.impl.operators.matching.single.preserving.explorative.ExplorativeHomomorphismSetPairLoopUnrollingTesttestGraphElementIdEqualityfor graphsGraph1_Chain2andGraph3_Unlabeled0
org.gradoop.flink.model.impl.operators.matching.single.preserving.explorative.ExplorativeHomomorphismTriplesForLoopTesttestGraphElementIdEqualityfor graphsGraph1_Chain2andGraph3_Unlabeled0
org.gradoop.flink.model.impl.operators.matching.single.preserving.explorative.ExplorativeIsomorphismSetPairBulkTesttestGraphElementIdEqualityfor graphsGraph1_Chain2andGraph3_Unlabeled0
org.gradoop.flink.model.impl.operators.matching.single.preserving.explorative.ExplorativeIsomorphismSetPairForLoopTesttestGraphElementIdEqualityfor graphsGraph1_Chain2andGraph3_Unlabeled0
org.gradoop.flink.model.impl.operators.matching.single.preserving.explorative.ExplorativeIsomorphismTriplesForLoopTesttestGraphElementIdEqualityfor graphsGraph1_Chain2andGraph3_Unlabeled0
All those tests run executeForVertex of ExplorativePatternMatching, where the vertices are put into Tuples after the map step. I think this might be a Flink issue, since reuse for EdgeFromIds works as expected.
This is most likely caused by FLINK-1521:
In the current implementation of ExplorativePatternMatching, two map functions are chained:
https://github.com/dbs-leipzig/gradoop/blob/de49d51f412b2d2595608b787eaa48bf931e2e49/gradoop-flink/src/main/java/org/gradoop/flink/model/impl/operators/matching/single/preserving/explorative/ExplorativePatternMatching.java#L138-L145
The relevant functions being VertexFromId and AddGraphElementToNewGraph.
By changing VertexFromId to reuse objects, we might have triggered the mentioned Flink issue:
- In
VertexFromIda reused vertex is created. - In each call of that
mapfunction, a newIDis assigned. - This
mapfunction is chained toAddGraphElementToNewGraph, so that same object is used as the parameter for that function. - That function assignes a new graph head and returns the element.
VertexFromIdstill holds the reference to that object, it is reused with the graph head added inAddGraphElementToNewGraph.- Repeat from step 2.
With object reuse disabled (which is the default behaviour), a new object (i.e. a copy) should be used as the input for AddGraphElementToNewGraph. This is, however, not the case, see FLINK-1521.
As a result of that, a new graph head is added for call of the AddGraphElementToNewId function, which is the cause of the failed tests.
Flink's Programming Guide states that, with object reuse disabled,
It is not safe to remember objects across function calls.¹
and
An object that was given to a Collector or returned as method result might have changed its value. It is not safe to read an output object.¹
[1] https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/batch/index.html#object-reuse-disabled-default