spicedb icon indicating copy to clipboard operation
spicedb copied to clipboard

Zed --explain flag uses cached results

Open winstaan74 opened this issue 1 year ago • 7 comments
trafficstars

What platforms are affected?

macos, others

What architectures are affected?

others

What SpiceDB version are you using?

v1.35.3

Steps to Reproduce

In Zed, the --explain flag makes use of cached results, even with the --consistency-full flag set. This makes it hard to get a full trace of how a permission decision was calculated. For example, running the same permission check twice gives differed answers each time -

❯ zed --insecure permission check --explain timesheet:1 read_timesheet user:123 --consistency-full                                              
4:00PM INF debugging requested on check
true
✓ timesheet:1 read_timesheet (2.347208ms)
└── ✓ engagement:3 read_timesheet (1.806125ms)
    ├── ⨉ engagement:3 supplier_for_attribute (994.125µs)
    ├── ⨉ engagement:3 manages_attribute (1.536667ms)
    └── ✓ engagement:3 self_attribute (1.711417ms)
        └── ✓ person:2 user (1.196667ms)
            └── user:123 

❯ zed --insecure permission check --explain timesheet:1 read_timesheet user:123 --consistency-full                                              
4:00PM INF debugging requested on check
true
✓ timesheet:1 read_timesheet (cached)
└── user:123 

I see the same behaviour when using the grpc api from a java client with the 'debug' flag set. My desired behaviour is an explanation of the permission checking path that can be displayed to a user.

Expected Result

A full permissions check trace each time.

Actual Result

The explanation for the second permission check is minimal.

winstaan74 avatar Sep 18 '24 15:09 winstaan74

--consistency-full indicates that the most recent revision must be used. If you make two requests and there have been no writes to SpiceDB, then using cache is correct and --explain always returns the explanation of the real work performed by SpiceDB.

This is working as intended.

Can you expand on what you're trying to do?

josephschorr avatar Sep 18 '24 15:09 josephschorr

This is for a spicedb running locally, so I can confirm there's no writes in-between one zed --explain and the next.

On our production system, we'd like to be able to understand why a particular permissioning decision was made - and I think this is probably a common usecase. In some cases, we'd like to be able to display the explanation to an expert end user.

If a permissioning decision is 'incorrect' for us, it's usually the case that a user has something setup incorrectly - and we'd like to be able to identify the issue by looking at how spicedb followed relationships to make the permissioning decision.

If an explain request doesn't evaluate the whole graph, but makes use of previously cached results, then it becomes harder to understand the cause of a permissioning decision.

The current behaviour of --explain is great for understanding where time is spent in an actual permission check - but in some cases it would be helpful to force a full evaluation without using any caches.

winstaan74 avatar Sep 19 '24 09:09 winstaan74

If an explain request doesn't evaluate the whole graph

Explain won't evaluate the whole graph even if caching is not used; if a permission is granted via two paths, you may only get one or the other back, depending on which was found first.

It sounds like you want more of an "audit" ability, but that has significant performance implications since it would have to both bypass the cache and bypass short circuiting.

How often would you expect this feature to be used?

josephschorr avatar Sep 24 '24 17:09 josephschorr

@winstaan74 Checking in on this

josephschorr avatar Oct 17 '24 20:10 josephschorr

I can understand why is this desired - if you're debugging the output of checks, having to wait until the cache expires is a painful dev experience. So we'd like to have zed permission check --explain always bypass the cache.

It would probably be good to document this in the --help flag, as well as document that you may get a different execution trace each time.

From an implementation perspective, I need to see whether it's possible today to bypass the cache on a per-request basis 🤔

miparnisari avatar Apr 04 '25 00:04 miparnisari

@winstaan74 could you clarify if what you want is only to skip the cache (which could have significant performance cost given the wrong schema or relationships..), or if you want to see what are all the possible paths of execution that may return a permission (which would likely be a significant change inside the Check API, since it currently returns as soon as it finds one path)?

miparnisari avatar Apr 04 '25 20:04 miparnisari

@winstaan74 Ping

josephschorr avatar Apr 24 '25 19:04 josephschorr