JRedisGraph icon indicating copy to clipboard operation
JRedisGraph copied to clipboard

Property data corruption after graph rebuild

Open xujiaxj opened this issue 5 years ago • 6 comments

RedisGraph version: 2.2.6 JRedisGraph version: 2.1.0

For simplicity, let's say we have two Java services talking to a RedisGraph server: one called GraphBuilder, the other called GraphQuerier. They both use JRedisGraph client. Our GraphBuilder has a timer task that builds a graph once every minute. We are using MERGE command, so it's essentially a timer task that keeps "upserting" the graph.

Everything is on K8s. When the Redis server crashes, its pod will be restarted, and the graph will be gone, but the GraphBuilder will rebuild the graph in no time. When this happens, queries ran on the GraphQuerier can produce corrupted data. Particularly the property names are messed up.

For instance, if the GraphQuerier has not been restarted after Redis crash, it will print a list of properties with property names all messed up:

properties={name=Service, component=String, instance=String, memory_limits=String, pod=workload | deployment | statefulset | daemonset | replicaset, cpu_limits=String, access_mode=String, _created=1602617008984, _updated=1602617495780}

But if we restart the GraphQuerier, or deploy a new GraphQuerier pod, and query the same node on the same graph, we will get the correct result:

properties={name=Service, app=String, kube_service=String, namespace=String, lookup_workload=workload | deployment | statefulset | daemonset | replicaset, workload=String, workload_type=String, _created=1602617008984, _updated=1602617495780}

Note for some properties, we are storing Java data types like String as the values, so don't be confused there.

The query we run to fetch the data is very simple, like

MATCH (n:EntityType {name:$param0}) WHERE (n._updated >= $param1 AND n._created <= $param2) RETURN n

Is it that JRedisGraph client has some form of cached mapping from property ids to property names? If the graph is rebuilt, the mapping will be out-of-date.

We don't have a way to always restart all our services whenever the Redis restarts. So this kind of data corruption cracks the foundation our software is built on.

xujiaxj avatar Oct 13 '20 21:10 xujiaxj

Forgot to mention, if we query the graph on CLI, it returns the correct data.

graph.Query dev "match (n:EntityType{name:'Service'}) return n"

1) 1) "n"
2) 1) 1) 1) 1) "id"
            2) (integer) 350
         2) 1) "labels"
            2) 1) "EntityType"
         3) 1) "properties"
            2)  1) 1) "name"
                   2) "Service"
                2) 1) "_created"
                   2) (integer) 1602617008984
                3) 1) "_updated"
                   2) (integer) 1602617374081
                4) 1) "app"
                   2) "String"
                5) 1) "kube_service"
                   2) "String"
                6) 1) "namespace"
                   2) "String"
                7) 1) "workload"
                   2) "String"
                8) 1) "lookup_workload"
                   2) "workload | deployment | statefulset | daemonset | replicaset"
               9) 1) "workload_type"
                   2) "String"

xujiaxj avatar Oct 13 '20 21:10 xujiaxj

@xujiaxj Thanks for reporting JRedisGraph (as well of all our RedisGraph clients) maintains a client-side cache for mapping between properties, labels, and relationship IDs to their string values. As you wrote correctly, JRedisGraph sends the query with a --compact flag, which causes RedisGraph to return a compact representation of the results set, containing only the properties' IDs. If JRedisGraph misses any id<->string mapping, it will later trigger a procedure call to complete its mapping. We will check this and update you.

DvirDukhan avatar Oct 13 '20 23:10 DvirDukhan

Looks like the JRedisGraph refreshes its local cache only if it detects a higher ID than the current max. Presumably, this indicates a new label/relationshipType/property is created.

However, if we delete some entities and recreate a few in a way a smaller ID is reused, will we run into the same wrong mapping problem? From the code, it looks like it might.

xujiaxj avatar Oct 15 '20 23:10 xujiaxj

@xujiaxj Don't be confused with entity ID and property/relationship/label ID. RedisGraph schema IDs take the "append-only" approach, meaning that for every property/label/relationship added, the object respective mapping container is added with the id<->string mapping. JRedisGraph caches the schema mapping, meaning it has its own view (state) of the graph schema. Since there is no re-use for mapping id, only append-only, JRedisGraph will refresh its cache when a new property/label/relationship ID has received, and it doesn't hold its id<->string mapping. Your issue deals with graph swapping (e.g., changing the graph key's value without client awareness).

DvirDukhan avatar Oct 16 '20 05:10 DvirDukhan

Hi @DvirDukhan , it looks like we ran into this data corruption issue in our prod environment and it caused a complete outage. Looking at the metrics we collect from our redis subscriptions in the Redis Enterprise Cloud, we can see that all of the redis instances we are using were upgraded which triggered the issue. Despite the upgrade being made in all of our environments, the issue only occurred in our prod environment. The only difference being that there are more graphs in that env (5 total). Our dev environments have anywhere from 1-3.

Another thing to note is that the only graph which had no corruption issues was the first graph created for that redis instance. The other 4 graphs that were created at later dates had problems. Only restarting our services communicating with redis resolved the corruption issue.

Our subscription numbers are: #1403485 and #1335035. There are 3 redisgraph deployments: prod-monitoring-redisgraph, prod-redisgraph, and dev-redisgraph. prod-redisgraph is the only one that had the issue.

We are running JredisGraph 2.3.0.

Thanks!

arramos84 avatar Jul 06 '21 19:07 arramos84

Here is the time (in PT) recorded for redis graph update : 5:24 AM Jul 4 PT

Screen Shot 2021-07-06 at 12 47 57 PM

manoja1 avatar Jul 06 '21 19:07 manoja1