perfetto
perfetto copied to clipboard
Discussion: Should we have a new simple to emit but powerful trace format?
We've been hearing from users who convert to Perfetto (i.e. not recording with Perfetto tools, not using the SDK etc) that they want something easier than protobuf but more capable than Chrome JSON. (specifically see this thread but I've seen this sentiment echoed elsewhere as well)
There's a bunch of trace formats out there already (https://xkcd.com/927/) so do we really need another one? This issue is to discuss whether we should do something about this, and if so, what.
Current state
Chrome JSON format (spec)
Pros:
- By far the most popular format people actually emit
- Trivially easy to write
- De facto standard - lots of tools support it
Cons:
- Very pid/tid focused (designed for Chrome internal tracing)
- Feels weird using it for GPUs, network requests, etc.
- No arbitrary nesting of tracks (cannot have group 1 -> group 2 -> track). Only nesting possible is "thread" then "process"
- No interning (though maybe we don't want interning in a "simple" format anyway)
Perfetto protobuf format
Pros:
- Efficient, lots of features
- Good documentation
- Can emit it in Python reasonably easily (docs)
Cons:
- It's protobuf - not trivially easy to write like JSON
- This is exactly why people still use Chrome JSON despite its limitations
What should we do?
1. Do nothing
Pros:
- Simple for us
- Avoids "yet another format" in the industry
- Keeps us aligned with well-used formats
Cons:
- People stuck with substandard representation of their data
- Forces users to either use other tools or take on protobuf complexity
2. JSON encoding of protobuf
This would mean accepting the JSON representation of our protobuf schema. Instead of emitting binary protobuf, you'd emit JSON that follows the same structure as our protobuf messages. For example, instead of constructing a protobuf TrackEvent, you'd write JSON like {"track_uuid": 123, "name": "my_event", ...} that maps to the protobuf fields.
Pros:
- Minimal support burden (just define which parts of protobuf we support in JSON + write thin C++ parsing layer)
- Get protobuf's features with JSON's ease of use
Cons:
- Can be confusing - now there are two "Perfetto" formats, similar but different
- Documentation/education burden
3. Invent something new
Pros:
- Can design something "optimal"
- Clearer differentiation from existing formats
Cons:
- Hard to know where to start
- If it's JSON-based (which it probably would be), why not just do option 2?
Personal opinion
I think option 2 is probably the best compromise followed then by option 1 if we decide not to do this. I don't feel too strongly on this though so I'm very open to discussion on this topic.
Questions for anyone reading this
- What's been your experience with current formats?
- If you use Chrome JSON - what specific pain points do you hit?
- If you avoid protobuf - is it the complexity or something else?
- Would a JSON encoding of protobuf actually solve your problems?
- How should we market a JSON encoding of protobuf to make it as easy to understand as possible?
Please chime in with your thoughts!
Big +1 to ease-of-use of JSON — it's ubiquity is extremely useful when trying to get folks to try tracing: if someone can see that they can build a prototype themselves in ~a minute the chances of them doing more stuff are much higher than if they have to figure out protobuf integration and tinker with the build system.
As far as Chrome JSON is concerned, I have two pain points:
- tracks (larger issue): customising the names of async tracks is a very common use-case that the users hit early and Chrome JSON doesn't have a way to customise it. Also there are no support for more complex track structure (e.g. supporting child tracks).
- flows (smaller issue): the flow event support Chrome JSON traces is quite unintuitive with FLOW_IN / FLOW_OUT / FLOW_INOUT.
In general I'm in favour of option 2, but I think that there are two issues with emitting TrackEvent traces manually, which ruin the magic "it just works" moment (which is the main benefit of Chrome JSON traces). If you forget to do either of them, you will get a broken trace without clear understanding of what went wrong (both of them have easy fixes, but they still add cognitive barrier at a very important point of the discovery user journey):
- trusted sequence id: each packet should have a "trusted sequence id" set.
- incremental state: you have to set the INCREMENTAL_STATE_RESET on the first packet in the emitted trace.
If we go with option 2, I think we should have sensible presets for these two cases (e.g. do not allow protobuf-json traces to set trusted sequence id / do anything with incremental state).
If you forget to do either of them, you will get a broken trace without clear understanding of what went wrong
So re "things going wrong without you understanding": this is something I'm fixing anyway. See https://github.com/google/perfetto/pull/3357 and https://github.com/google/perfetto/pull/3455 which take big steps forwards towards discoverability of these things.
I think we should have sensible presets for these two cases (e.g. do not allow protobuf-json traces to set trusted sequence id / do anything with incremental state).
trusted sequence id: each packet should have a "trusted sequence id" set.
I totally agree. In fact, I think we should drop this for proto traces also. If it's not set, we just assume zero.
incremental state: you have to set the INCREMENTAL_STATE_RESET on the first packet in the emitted trace.
I might be wrong but AFAIK uou actually don't need to do this today at all if your other packets don't set NEEDS_INCREMENTAL_STATE. I agree it should not be necessary to do this on "simple" traces, even for proto.
In general, anything we do for "Perfetto JSON format" we should also do for Perfetto protobuf. I really really would keep the formats fully in sync if we go that route. There's nothing more confusing than for me and for users to have slightly different semantics.
Oh also: I think we should support "inline track definition" (i.e. no indirection of track_uuid at all). I think that would make the format even simpler to implement as fundamentally track uuids are a form of interning and so should be considered "advanced".
My vote's for 2.
I feel like the main advantages of the Chrome JSON format are:
- A lot more well known, and thus easier to find examples of
- Simpler, easier to understand
- Human readable and writable without a build step
- Easy to generate using your programming language of choice
While we can never hope to compete with 1 and 2, I think 3 and 4 can be solved by having a JSON representation of the protobuf format as you suggested in option number 2.
Minimal support burden (just define which parts of protobuf we support in JSON + write thin C++ parsing layer)
QQ: It seems like you're suggesting we could only support a subset of the format in JSON? Why is that the case?
I totally agree. In fact, I think we should drop this for proto traces also. If it's not set, we just assume zero.
Yep, would be great if we could do that!
I might be wrong but AFAIK uou actually don't need to do this today at all if your other packets don't set NEEDS_INCREMENTAL_STATE. I agree it should not be necessary to do this on "simple" traces, even for proto.
I did run into problems earlier this year due to absence of INCREMENTAL_STATE_RESET, but maybe I was setting NEEDS_INCREMENTAL_STATE? Not 100% sure, will double-check.
I think we should support "inline track definition" (i.e. no indirection of track_uuid at all). I think that would make the format even simpler to implement as fundamentally track uuids are a form of interning and so should be considered "advanced".
This is interesting — I'd say that track_uuid is not just transparent interning, but rather a part of defining what a track is (so similar to flow id) — e.g. async tracks with different ids but with the same name is one of the most common use-cases, so not sure how much practical use cases there will be for inline track definition. Intuitively, however, supporting it seems like a good idea.
Generally I'd be in favor of "something that is 1:1 in sync with the track_event.proto schema, but has another encoding". Essentially we'd use the proto as the source of truth for the schema, but allow some other encoding.
If we went there, it would be nice to have something:
- easy to write for embedded/microcontrollers
- that leads to small traces if you have a lot of data. JSON is very verbose and we have seen internal users ending up with enormous traces where most of the size is JSON-induced cruft
1) If we go for JSON, could we figure out an incremental update to the chrome json rather than starting from scratch?
I would have to think a bit more on how this looks like. Essentially something that allows "upgrading" from the canonical chrome json event format to our track-event-proto-derived JSON proto. It probably boils down to introducing some new unused "ph" in the chrome JSON with the semantic "this follows the newer TrackEvent format".
I think it would be a bit less confusing rather than "we support two JSON formats". Imagine the chaos in the community when people are trying to figure out.
2) Should we use JSON?
I'm just not 100% sure I'm sold on JSON. JSON is still a bit of a pain to emit, especialy in the embedded context, requires balancing braces and maintaining state. If you miss a brace, or mess one quote, the entire file becomes invalid.
If we went here, I'd love to explore some other text-format. My concrete proposals here would be:
1: NDJSON Newline Delimited JSON, a text-based data format where each line is a self-contained, valid JSON object.
2: A more minimialistic "one event per line" format inspired by logfmt
t=1234;n=my event;c=categgory1;c=category2;da=n=debug annotation;v=FirstGestureScrollUpdate;;
Where essentially:
- We could allow shorter aliases "t" rather than "timestamp", but sitll allow the long name from the proto schema (as we are not going to define aliases for every possible field, we could only annotate some)
- You don't need any quote (we know upfront if something is a string from the schema, so there is no need of disambiguate p=123 from p="123", because the schema would decide the interpretation, not the encoding
- You use ; (which is a rarely used character in traces) as a separator. this avoids the need of quotes for strings and reduces the overall bytes of the trace. Of course if you have a ";" in the name it must be escaped (
\;) - Nesting can be acheived using still just =, becuase we have the schema. Every ; pops one from the stack (hence the ;; in
da=n=...;;)
Other alternatives to consider:
- ZJSON: Zed's JSON, a bit more compact: see https://zed.brimdata.io/docs/formats/zjson
- JSON5 (although I'd still enforce a "one event per line"): https://json5.org/
- Any of these formats that Zed's zq supports https://zed.brimdata.io/docs/commands/zq
In any case
In any case, the two properties that matters to me are:
-
the fact that whatever we come up with, there should be a self-maintaining tool that can convert between proto and the text format, and such tool can be autogenerated our .proto schema (rather than manually maintained)
-
Having some tooling like
jqorzqto filter/search raw data. I feel this is the biggest limitation we have with protobuf when we debug raw traces. the state of the art today is dumping txtprotos and using grep/awk
If I had to pick one
I'd go with NDJSON. It's the best compromise. Not the most compact, but works well with jq (to be confirmed, AI suggests yes) which is the king of tooling
My cautious vote goes out to option 2, a JSON representation of the protobuf format. As already mentioned, this makes the JSON variant of the Perfetto Trace format a "mere" different encoding of the same thing, leaving little room for disagreement in semantics between the two encodings.
However I think it would be good to reality check some small example traces. I have looked at perfetto_trace.proto, which I hope is the correct proto definition file for the Perfetto Trace format. There seem to be a number of ways to represent data, see e.g. the timestamp encoding situation where there are deprecated variants. The check would be to write a couple of JSON traces (which I think should be able to be loaded into protobuf format through protobuf tooling) and see whether a.) it was obvious to write and b.) whether they load correctly.
For the embedded use cases mentioned by @primiano I can of course only speak for myself. I personally would probably split the pipeline anyway in this case: The embedded platform generates a simple format, I have used similar approaches to NDJSON in the past, and a program on a larger computer transcribes it to Perfetto trace format. Most properties can be mapped over directly to e.g. span arguments. Clearly, this is an extra step. However it frees the embedded platform from a number of considertations. For instance, they don't need to generate timestamps in nanoseconds. (They might keep time in some odd frequency and not have the resources to do 64 bit arithmetic for every timestamp) Or for another example, you can overlay structure (i.e. the nesting of groups) in post processing where you can iterate more quickly.
What I mentioned in the Lobste.rs thread (see top link in the OP) is that I see Chrome and Perfetto tracing formats as formats that are closer to visualization than to machine readable data transmission (like structured logging). Considering this I feel comfortable with the two step approach outlined about. Of course YMMV and I am very interested in how others trace :)
What's been your experience with current formats?
I think we are the weird rare bunch of people who are generating protobuf (directly in C), so for us the only problem was a lack of documentation in the earlier day but it have been solved. So not super helpful but we are the Option 1/ : )
I have found both Chrome Trace JSON to be simple to use, and is my preferred format for small quick projects that emit synthetic trace. primarily because.
- It is trivial to write code to emit it.
- JSON itself is very widely used and easy to understand making the format very accessible.
- doesn't require any specific programming language, third party libs, or build system if you want to emit it, unlike synthetic protobuf and Tracing SDK.
- common scripting languages like python and java script make it easy to serialize/deserialize using their respective standard libraries
- Trace representation is in plain text so i don't need to dump or convert it in order it to inspect.
eventually when project becomes complex enough i will deal with complexities and migrate to either tracing sdk or synthetic protobufs for the following reason:
- more compact trace representation
- better serialization/deserialization perf vs using plain text.
- more flexible in terms of what events can be represented.
I don't think we can get the technical benefits of protobuf without giving up simplity, but i think a simple format could potentially be more expressive, similar to the protobuf format.
Chrome JSON is definitely imperfect, and i'd appreciate a new simpler format, but it has also been good enough for me. I also don't feel to strongly on this. I'd probably use an option 2 if it existed, but would also not feel like i'm missing much if we just did option 1 (do nothing).
My 2 cents on the wire format:
- a binary format like CBOR would go a long way in improving the efficiency of encoding and decoding, but it does make traces more opaque and hard to read, so it's probably not realistic
- if JSON is used, please use ndjson/jsonl instead of a single object, it's so much easier to work with
- if a text format is used but doesn't have to be JSON, I'd suggest resp3 (the redis wire format) because it's:
- extremely simple to emit (more than JSON, since no string escaping is required) and parse
- designed for streaming
- binary safe if needed, but if all strings are regular text then it's also regular text