spicedb icon indicating copy to clipboard operation
spicedb copied to clipboard

Missing spans from trace for v1.PermissionsService/LookupResources

Open wthornton1 opened this issue 1 year ago • 7 comments
trafficstars

What platforms are affected?

linux

What architectures are affected?

amd64

What SpiceDB version are you using?

v1.27.0

Steps to Reproduce

  1. Make an authzed.api.v1.PermissionsService/LookupResources request
  2. Collect the traces from spicedb

Expected Result

All or most of the excution time should be accounted for, and should include the fan out via dispatch.

Actual Result

Approximately ~90% of the total execution time seems to be be missing

Screenshot 2023-11-27 at 19 49 18

wthornton1 avatar Nov 27 '23 20:11 wthornton1

@wthornton1 can you specify which datastore are you using?

vroldanbet avatar Dec 06 '23 14:12 vroldanbet

Currently using postgres 15.3 - I can edit the issue description if you want

wthornton1 avatar Dec 06 '23 14:12 wthornton1

@wthornton1 I've kind of reproduced this locally. It's the time spent streaming the various items back to the client. This is tricky to trace because it can generate a ton of traces (I tested it and my LookupResource call is filled with thousands of elements), and I'm not sure that's useful.

vroldanbet avatar Dec 06 '23 16:12 vroldanbet

Is there anyway to put some kind of overarching parent trace? ie. collecting responses or something of ths sort

wthornton1 avatar Dec 08 '23 12:12 wthornton1

@wthornton1 I didn't spend a lot of time into it but I couldn't figure out a way to aggregate this in a visually-compelling way. Different parts of the reachability graph are evaluated concurrently and the whole pipeline works async, so responses are published as they become available. It always led to a completely unusable trace with lots of small pieces. I'm going to summon @josephschorr with the hope he can suggest an approach to make those traces more usable.

vroldanbet avatar Dec 12 '23 08:12 vroldanbet

I've tried doing so myself, and run into the exact same problem

Edit: We either go super specific with lots of small pieces or we leave it very high level

josephschorr avatar Dec 13 '23 20:12 josephschorr

Thank you both - having confirmation about what that blank time is is helpful.

DBBrowne avatar Jan 13 '24 11:01 DBBrowne

We recently found that OTel client-side streaming interceptors weren't being installed, which meant that in an actual SpiceDB cluster spans from LookupResources traces wouldn't show up.

This has been fixed in https://github.com/authzed/spicedb/pull/1836, which instruments OTel using the new gRPC stats API for this.

vroldanbet avatar Apr 01 '24 08:04 vroldanbet

fantastic, thanks!

Happily, spicedb has been rock solid for us since November. python gRPC, not so much.

DBBrowne avatar Jun 11 '24 08:06 DBBrowne