unipop icon indicating copy to clipboard operation
unipop copied to clipboard

Does unipop support modeling a single table data as a 'virtual' graph?

Open sorryya opened this issue 7 years ago • 3 comments

For example, if a elasticsearch document containing some vertexes and edges, and each vertex or edge is represented by a set of fields from the document, how to write the mapping file?

sorryya avatar Nov 08 '17 11:11 sorryya

@sorryya Did you mean this kind of mapping Inner Edges?

seanbarzilay avatar Nov 09 '17 11:11 seanbarzilay

I mean: Elastic document like this:

{
    "_index": "xxx",
    "_type": "yyy",
    "_id": "AV-VSXTUbcKGrP6qekMg",
    "_source": {
        "field_1": "1111",
        "field_2": "2222",
        "field_3": "3333",
        "field_4": "4444",
        "field_5": "5555",
        "field_6": "6666",
        "field_7": "7777",
        "field_8": "8888",
        "field_9": "9999"
    }
}

My scene:

  1. Each document represents a event, and I want to model a graph about cooccurrence relations of the objects in the event.
  2. Some fields about the event are for edges, some fields about the objects are for vertices.
  3. So, one field may be as an id or a property for several edges or vertices, a field as vertex id may have duplicate value in documents.
  4. The "id" may be combined by a set of fields.
  5. The "index" should be all indexes or some indexes in elasticsearch.

Can Mapping file be like this?

{
  "class": "org.unipop.elastic.ElasticSourceProvider",
  "clusterName": "escluster",
  "addresses": "http://localhost:9200",
  "edges": [
    {
      "index": "*",
      "id": {
        "fields": ["some_value", "@_id"],
        "delimiter": "+"
      },
      "label": "lable_e1",
      "properties": {
        "field_1": "@field_1",
        "field_2": "@field_2",
        "field_3": "@field_3"
      },
      "outVertex":{
        "ref": false,
        "id": "@field_4",
        "label": "lable_v1",
        "properties": {
          "field_5": "@field_5"
        }
      },
      "inVertex":{
        "ref": false,
        "id": {
          "fields": ["@field_6", "@field_7"],
          "delimiter": "+"
        },
        "label": "lable_v2",
        "properties": {
          "property_name": {
            "fields": ["@field_6", "@field_7"],
            "delimiter": "+"
          },
        }
      }
    },
    {
      "index": "*",
      "id": "@_id",
      "label": "lable_e2",
      "properties": {
        "field_1": "@field_1",
        "field_2": "@field_2",
        "field_5": "@field_4",
        "field_7": "@field_8",
      },
      "outVertex":{
        "ref": false,
        "id": "@field_4",
        "label": "lable_v3",
        "properties": {
          "field_5": "@field_5"
        }
      },
      "inVertex":{
        "ref": false,
        "id": "@field_9",
        "label": "lable_v4",
        "properties": {
          "field_9": "@field_9"
        }
      }
    }
  ]
}

In this case, here are the problems I have met:

  1. If I use "ref" as false in "outVertex" or "inVertex", it throws:java.lang.NullPointerException.
  2. The count of edges I queried is much less than it actually is, which g.E().count() got 9881, but the elastic documents count is 7242721.
  3. If I define vertices all within edges in mapping file, the count of vertices I got is 0 use g.V().count().
  4. If I defind vertices as independent ones(not within edges), the count of vertices I got is much less than it actually is, which g.V().values("field_4").count() got 251, but the distinct count of field_4 (as the vertex's id and property) in elasticsearch is 753.
  5. When I use the fuction "has(...)" to query, I got nothing.

sorryya avatar Nov 10 '17 06:11 sorryya

@sorryya I haven't tested a schema where both vertices are non reference vertices, so I will fix it and release a patch in the next few days.

seanbarzilay avatar Nov 15 '17 09:11 seanbarzilay