node-clinic-flame icon indicating copy to clipboard operation
node-clinic-flame copied to clipboard

[PARKED] Collect timeout feature

Open DylanC opened this issue 5 years ago • 10 comments

Feature will allow a timeout to be passed as argument. This argument will be used to delay execution of collect regarding the 0x lib.

DylanC avatar Feb 06 '20 17:02 DylanC

So 0x has this file that turns a v8 profiler log into an array of stack traces called ticks. Each element in the array is the stack trace at a particular point in time. Those stack traces will be turned into the flame graph that we see at the end in Clinic Flame. To implement the timeout filtering, we would want to remove from the ticks array the stack traces that were captured before our timeout.

The V8 profiler output has JSON objects with some data about each captured stack trace, and that is turned into our ticks array here:

https://github.com/davidmarkclements/0x/blob/ea239f31379922451456518479bf442499426853/lib/v8-log-to-ticks.js#L122-L126

Here, the tick variable is one of V8's JSON objects.

I think we can check that the stack trace was recorded after our timeout, and only do the ticks.push(stack.reverse()) call if that is true. So we have to check if the tick object was generated after our timeout had passed.

We can look at how these tick objects look by manually generating a V8 profile:

# outputs an isolate-0x123456789-12345-v8.log file
node --prof examples/simple/index.js
# turn it into JSON
node --prof-process --preprocess -j isolate-*-v8.log > log.json

(The 0x repository contains a file examples/simple/index.js that you can use to try this out)

The log.json file that we created with the second command will contain a large JSON object. We're interested in the "ticks" property, which contains these tick objects. "ticks" contains an array that looks like the below:

"ticks": [
    {
  "tm": 23385,
  "vm": 0,
  "s": [
    1, 11388029, 1, 12561680, 36939, 150, 36938, 989, 36936, 113, 36935, 80, 36934, 35, 36933, 90
  ]
},
...
]

So each tick object in our code above is going to be an object like { tm: number, vm: number, s: [array of numbers] }.

  • s is our stack trace. to save space, it uses indices into the "code" array instead of strings. The addr.map((n) => codes[n]) line turns these indices into familiar stack trace strings.
  • vm is some v8 state value, idk what it's for but we don't use it :smile:
  • tm is the timestamp. This is the number of nanoseconds since the start of the VM.

So, we need to run the ticks.push(stack.reverse()) statement only if the tm value (which is expressed in nanoseconds) is larger than our desired delay (which is in milliseconds).

Does that clarify things a bit?

goto-bus-stop avatar Feb 11 '20 14:02 goto-bus-stop

@goto-bus-stop - A bit yeah. I'll look through the code base and make sense of how this is working regarding your comment.

DylanC avatar Feb 11 '20 15:02 DylanC

@goto-bus-stop - I have the parameter collectDelay filtering all the way down to my copy of 0x. https://github.com/DylanC/0x/tree/collect_timeout_feature

I also have some sudo code inside v8-log-to-ticks.js. Just need to implement a timestamp value for the collectDelay timeout and filter the results based on this value.

DylanC avatar Feb 12 '20 17:02 DylanC

@goto-bus-stop - Just investigating the CI failure. Its not due to linting but detect-port.test seems to be failing.

test/detect-port.test.js ........[eval]:1 throw new Error() ^ Error at [eval]:1:7 at Script.runInThisContext (vm.js:116:20) at Object.runInThisContext (vm.js:306:38) at Object. ([eval]-wrapper:9:26) at Module._compile (internal/modules/cjs/loader.js:955:30) at evalScript (internal/process/execution.js:80:25) at internal/main/eval_string.js:23:3 Target subprocess error, code: 1

DylanC avatar Feb 14 '20 11:02 DylanC

Tests fail locally for me with --collectDelay is not a recognised flag. CI installs 0x from npm and the version published on npm doesn't include that flag yet.

Normally we change the dependency in Flame's package.json to something like "0x": "your-github-username/0x#branchname" while the upstream 0x PR is still open, so npm installs 0x from your github fork. Once a new 0x version is published, we change it back to that version before merging the Flame PR.

goto-bus-stop avatar Feb 18 '20 10:02 goto-bus-stop

@goto-bus-stop - I have one error left to resolve for this change. In test/cmd-collect-delay.test.js:

test/detect-port.test.js ........[eval]:1 throw new Error() ^ Error at [eval]:1:7 at Script.runInThisContext (vm.js:122:20) at Object.runInThisContext (vm.js:329:38) at Object. ([eval]-wrapper:6:22) at Module._compile (internal/modules/cjs/loader.js:778:30) at evalScript (internal/bootstrap/node.js:589:27) at startup (internal/bootstrap/node.js:265:9) at bootstrapNodeJSCore (internal/bootstrap/node.js:622:3) Target subprocess error, code: 1

DylanC avatar Mar 02 '20 11:03 DylanC

@DylanC that's actually expected output from a test that tests a failure case … would be good to silence that somehow :sweat_smile: Failures can be a bit obscured on CI because of the lengthy TAP output. I usually Ctrl+F for "not ok", which TAP prints when a test fails. Here, it looks like delayOneSecond is still found in the tree:

image

Timing on CI can be a real bother. Maybe the 300ms extra delay was still not enough… You could try adding a few more 100ms of wiggle room, if that doesn't work we may have to think of something smarter.

goto-bus-stop avatar Mar 02 '20 14:03 goto-bus-stop

@goto-bus-stop - Not sure what happened but the CI seems to be stuck. 3 did pass after the last collect delay increase.

DylanC avatar Mar 03 '20 10:03 DylanC

hmm yeah i'm looking at it too…seems like allsome of the other ones are still seeing the delayOneSecond node :/

there are two .DS_Store files in this PR still. could you get rid of them?

goto-bus-stop avatar Mar 03 '20 10:03 goto-bus-stop

Some of the other tests became(?) flaky? Working on that in https://github.com/nearform/node-clinic-flame/pull/108

goto-bus-stop avatar Mar 03 '20 10:03 goto-bus-stop