codepropertygraph icon indicating copy to clipboard operation
codepropertygraph copied to clipboard

Performance matrix logger

Open pandurangpatil opened this issue 1 year ago • 10 comments

Added one utility to record the performance matrix in the following format.

  1. One if you initialize method and set the file where to write the matrix in csv format then it will start writing to that specific file with the configured interval of time.
DateTime MainStage (group of passes like overlay pass) SubStage (Pass to be executed) MetaData (some additional data like which CPG pass is being used) CPU (%) Used Memory (MB) Max Heap (MB)
  1. It will also log debug logs when a pass is invoked and when the pass is ended with the time it took to finish the pass.

Sample report performancematrix.csv

pandurangpatil avatar Feb 29 '24 14:02 pandurangpatil

Good idea, @mpollmeier could you give this a look? I recall some basic metrics captured in ODB but not sure if it's done at this level to include memory usage etc

From what I can see and think of we only collect very basic metrics like number of changes in overflowdb's (and flatgraph's) DiffGraphApplier. The problem with time recorders is that it's not comparable between machines - I still think it's useful to have though! I will have a read through here now.

mpollmeier avatar Mar 07 '24 10:03 mpollmeier

Good idea, @mpollmeier could you give this a look? I recall some basic metrics captured in ODB but not sure if it's done at this level to include memory usage etc

From what I can see and think of we only collect very basic metrics like number of changes in overflowdb's (and flatgraph's) DiffGraphApplier. The problem with time recorders is that it's not comparable between machines - I still think it's useful to have though! I will have a read through here now.

@mpollmeier even though it's not comparable between machines. It will be really helpful to understand which pass requires optimisation based on this data from a given machine. Otherwise, we are clueless and look through code, make guesses or use the profiler.

pandurangpatil avatar Mar 07 '24 11:03 pandurangpatil

mostly +1 to what David said

It'd be useful to have a minimised integration test, e.g. run the io.shiftleft.mypass.SamplePass (or similar) and verify that the measurements work in principle.

Unit test has been added.

pandurangpatil avatar Mar 08 '24 18:03 pandurangpatil

@DavidBakerEffendi @mpollmeier I have added comprehensive documentation to the file which will try to explain the implementation. Let me know if you have further questions.

I have also addressed the majority of review comments in the code. Where I think it's difficult to make the changes, I have replied to the comment.

pandurangpatil avatar Mar 08 '24 18:03 pandurangpatil

+1 for what David said - currently this is global mutable state, which may be accessed and modified by concurrently running CpgPasses. (yes, currently they run in sequence, but the idea is that we can parallelise independent ones)

Probably I am not able to connect how making Class we can achieve object not being accessed concurrently. If we have to write the status to the common file, we will have to access the file concurrently.

pandurangpatil avatar Mar 08 '24 18:03 pandurangpatil

@DavidBakerEffendi @mpollmeier

We have to record the data from start to end.

we need to make sure we initialise it outside any of the passes and end it towards the end when everything is done.

As well as we should be able to start a stage from passes and end the respective stage there. We need to have something like Singleton solution. Otherwise, we have to pass on the object through every pass created in Joern.

pandurangpatil avatar Mar 11 '24 08:03 pandurangpatil

We need to have something like Singleton solution. Otherwise, we have to pass on the object through every pass created in Joern.

It may well be very complicated, and I'm open for a singleton, but I'd like to at least give it a try via argument passing. Most passes are invoked from a few places, so passing the TimeMetric instance through (via regular or implicit argument) might not be as bad as it sounds. But that's only my gut feeling, did you give it a try yet?

mpollmeier avatar Mar 12 '24 09:03 mpollmeier

@pandurangpatil @mpollmeier Copying from https://discord.com/channels/832209896089976854/1215216562198290483/1216974330630639666

In finding a balance between static object and classes, let me propose that TimeMetric has both an object and class component, but that object TimeMetric keeps track of all the recorders in flight.

When a pass starts, all passes are given a Cpg, where the graph can be used as uniquely identifying a pass to a recorder. e.g. when a pass starts, it can check Am I recording? If yes, TimeMetric.pushStage(..., cpg.graph)

Then object TimeMetric could have some map of mutable.TrieMap[Graph, TimeMetric] so if a user is recording multiple CPG generations at the same time we won't modify the same global recorder

DavidBakerEffendi avatar Mar 12 '24 12:03 DavidBakerEffendi

@mpollmeier @DavidBakerEffendi I acknowledge both the suggestion. Let me give it a try and update here.

pandurangpatil avatar Mar 13 '24 13:03 pandurangpatil

@DavidBakerEffendi as discussed with one urgent requirement, I need to get this matrix logger deployed to the customer end to collect the precise data points in the absence of code access. So created this temporary arrangement with this PR - https://github.com/ShiftLeftSecurity/codepropertygraph/pull/1765 to intercept the starting and ending events.

I will work on finalising this PR as soon as I get free from the priority items.

cc: @mpollmeier

pandurangpatil avatar Apr 09 '24 17:04 pandurangpatil