gradoop icon indicating copy to clipboard operation
gradoop copied to clipboard

Object reuse leads to failed tests in pattern matching.

Open p-f opened this issue 6 years ago • 1 comments

I was trying to change the org.gradoop.flink.model.impl.functions.epgm.VertexFromId class to reuse objects instead of calling the factory in every map-step:

  /** Reuse. */
  private final EPGMVertex reuseVertex;

  /**
   * Create new function.
   *
   * @param vertexFactory EPGM vertex factory
   */
  public VertexFromId(VertexFactory<EPGMVertex> vertexFactory) {
    reuseVertex = vertexFactory.createVertex();
  }

  @Override
  public EPGMVertex map(Tuple1<GradoopId> gradoopId) throws Exception {
    // Before:
    //return vertexFactory.initVertex(gradoopId.f0);
    reuseVertex.setId(gradoopId.f0);
    return reuseVertex;
  }

This leads to the following tests failing:

  • org.gradoop.flink.model.impl.operators.matching.single.preserving.explorative.ExplorativeHomomorphismSetPairBulkTest
    • testGraphElementIdEquality for graphs Graph1_Chain2 and Graph3_Unlabeled0
  • org.gradoop.flink.model.impl.operators.matching.single.preserving.explorative.ExplorativeHomomorphismSetPairLoopUnrollingTest
    • testGraphElementIdEquality for graphs Graph1_Chain2 and Graph3_Unlabeled0
  • org.gradoop.flink.model.impl.operators.matching.single.preserving.explorative.ExplorativeHomomorphismTriplesForLoopTest
    • testGraphElementIdEquality for graphs Graph1_Chain2 and Graph3_Unlabeled0
  • org.gradoop.flink.model.impl.operators.matching.single.preserving.explorative.ExplorativeIsomorphismSetPairBulkTest
    • testGraphElementIdEquality for graphs Graph1_Chain2 and Graph3_Unlabeled0
  • org.gradoop.flink.model.impl.operators.matching.single.preserving.explorative.ExplorativeIsomorphismSetPairForLoopTest
    • testGraphElementIdEquality for graphs Graph1_Chain2 and Graph3_Unlabeled0
  • org.gradoop.flink.model.impl.operators.matching.single.preserving.explorative.ExplorativeIsomorphismTriplesForLoopTest
    • testGraphElementIdEquality for graphs Graph1_Chain2 and Graph3_Unlabeled0

All those tests run executeForVertex of ExplorativePatternMatching, where the vertices are put into Tuples after the map step. I think this might be a Flink issue, since reuse for EdgeFromIds works as expected.

p-f avatar Jul 04 '19 12:07 p-f

This is most likely caused by FLINK-1521: In the current implementation of ExplorativePatternMatching, two map functions are chained: https://github.com/dbs-leipzig/gradoop/blob/de49d51f412b2d2595608b787eaa48bf931e2e49/gradoop-flink/src/main/java/org/gradoop/flink/model/impl/operators/matching/single/preserving/explorative/ExplorativePatternMatching.java#L138-L145 The relevant functions being VertexFromId and AddGraphElementToNewGraph.
By changing VertexFromId to reuse objects, we might have triggered the mentioned Flink issue:

  1. In VertexFromId a reused vertex is created.
  2. In each call of that map function, a new ID is assigned.
  3. This map function is chained to AddGraphElementToNewGraph, so that same object is used as the parameter for that function.
  4. That function assignes a new graph head and returns the element.
  5. VertexFromId still holds the reference to that object, it is reused with the graph head added in AddGraphElementToNewGraph.
  6. Repeat from step 2.

With object reuse disabled (which is the default behaviour), a new object (i.e. a copy) should be used as the input for AddGraphElementToNewGraph. This is, however, not the case, see FLINK-1521.
As a result of that, a new graph head is added for call of the AddGraphElementToNewId function, which is the cause of the failed tests.

Flink's Programming Guide states that, with object reuse disabled,

It is not safe to remember objects across function calls.¹

and

An object that was given to a Collector or returned as method result might have changed its value. It is not safe to read an output object.¹

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/batch/index.html#object-reuse-disabled-default

p-f avatar Jul 12 '19 11:07 p-f