perfview Nettrace V5 wishlist

We've been contemplating creating a breaking change in the Nettrace format. Ideally I'd like to keep such changes infrequent and give lots of lead time for new versions of the reader to become widely deployed before shifting any existing scenario to use it. This means trying to plan ahead for needs that could still be years ahead and then deciding what is worth the effort to implement in advance. This is a braindump of potentially useful format capabilities we could add:

compressed event streams for smaller trace sizes
encrypted event streams - asymmetric ciphers allow data to be encrypted when it is written and only decyrpted by someone who knows the private key)
tracing events from multiple processes
tracing events from multiple PID namespaces / virtual machines / physical machines (which might require support of multiple clock sources)
embedding ETW events (or other event sources?)
allow the writer to create defined ranges in the event stream that can be read in total isolation from the rest of the file. This allows scanning to identify some time range of interest, slicing just that portion of the file and being able to read it back. This doesn't work today because there is a bit of unique information in file headers at the front + MetadataBlocks that are distributed arbitrarily within the file. If symbolication of stack trace IPs is required, then JIT events or rundown events also are unlikely to be available locally.
allow encoding process start/stop, thread start/stop, and module load/unload in a platform neutral way #1265
Eliminate the requirement for the reader to track the current read offset relative to the beginning of stream to determine the size of padding at the beginning of aligned blocks. (We can either make the padding size self-describing or get rid of it)
ability to encode multiple concurrent streams of events. This could be used to allow quicker access to certain subsets of the events or to efficiently multiplex multiple nettrace streams into a single stream.
1st class symbolic data emitted just-in-time - in the same way that we emit event metadata and stacks just-in-time so that events can refer back to it, we could have explicit symbolic data emitted where it is needed rather than having to scrape it from JIT events or rundown events. It takes some extra overhead for writers to track and emit this but could save substantial tool difficulty trying to re-assemble symbolic data out-of-band.
better file format extensibility (and smaller block headers?). Right now nettrace uses the FastSerialization format to encode all of the blocks. By default the parser fails if it encounters any block it doesn't recognize and blocks don't encode their own length in a uniform way so even an updated reader wouldn't know how to skip them. A more uniform length-prefixed scheme would let us ignore new blocks by default instead of failing. As I recall the FastSerializer block header is also fairly verbose, I think 20-30 bytes for each block? I believe blocks are relatively infrequent right now so size isn't a huge concern, but the large header size means it could be quite inefficient to add a large number of small payload blocks in the future. Size/Complexity in the block header also might play a role in how efficiently we can seek the file to find just the subset of data we are interested in.
make the trace describe what providers/events/keywords/filterArgs were used to collect it. If the configuration can change during the trace then updates are also logged just-in-time within the trace

Sep 15 '20 09:09 noahfalk

cc @mjsabby @josalem @sywhang @davmason

Sep 15 '20 09:09 noahfalk

cc @brianrob

Sep 15 '20 09:09 noahfalk

(or other event sources?)

perf events on Linux? We already know we can correlate events between a nettrace file and any other events collected using the same CLOCK_MONOTONIC.

1st class symbolic data emitted just-in-time - in the same way that we emit event metadata and stacks just-in-time so that events can refer back to it, we could have explicit symbolic data emitted where it is needed rather than having to scrape it from JIT events or rundown events. It takes some extra overhead for writers to track and emit this but could save substantial tool difficulty trying to re-assemble symbolic data out-of-band.

1000%. I would love to reduce or eliminate the reliance on rundown. I think we should look for tradeoffs we're willing to make in exchange for no rundown. We could still leave rundown as an option since it might make sense in some scenarios.

Sep 15 '20 19:09 josalem

I am sorry, if this is not the correct place to ask a question. We built a custom UI over Microsoft.Diagnostics.Tracing.TraceEvent nuget and it works fine until we moved our service in Linux Azure App Service. Following guide in section in How to leverage dotnet-trace in App Services Linux we are able to produce nettrace file. And the questions are: Is it possible to replay this type like normal ETL file ? Is there a way to convert from nettrace to ETL? What is your advice for a solution to our problem ?

Thank you in advance.

Mar 27 '21 17:03 TrayanZapryanov