node-clinic-flame
node-clinic-flame copied to clipboard
[PARKED] Collect timeout feature
Feature will allow a timeout to be passed as argument. This argument will be used to delay execution of collect regarding the 0x lib.
So 0x has this file that turns a v8 profiler log into an array of stack traces called ticks
. Each element in the array is the stack trace at a particular point in time. Those stack traces will be turned into the flame graph that we see at the end in Clinic Flame. To implement the timeout filtering, we would want to remove from the ticks
array the stack traces that were captured before our timeout.
The V8 profiler output has JSON objects with some data about each captured stack trace, and that is turned into our ticks
array here:
https://github.com/davidmarkclements/0x/blob/ea239f31379922451456518479bf442499426853/lib/v8-log-to-ticks.js#L122-L126
Here, the tick
variable is one of V8's JSON objects.
I think we can check that the stack trace was recorded after our timeout, and only do the ticks.push(stack.reverse())
call if that is true. So we have to check if the tick
object was generated after our timeout had passed.
We can look at how these tick
objects look by manually generating a V8 profile:
# outputs an isolate-0x123456789-12345-v8.log file
node --prof examples/simple/index.js
# turn it into JSON
node --prof-process --preprocess -j isolate-*-v8.log > log.json
(The 0x repository contains a file examples/simple/index.js
that you can use to try this out)
The log.json
file that we created with the second command will contain a large JSON object. We're interested in the "ticks"
property, which contains these tick
objects. "ticks"
contains an array that looks like the below:
"ticks": [
{
"tm": 23385,
"vm": 0,
"s": [
1, 11388029, 1, 12561680, 36939, 150, 36938, 989, 36936, 113, 36935, 80, 36934, 35, 36933, 90
]
},
...
]
So each tick
object in our code above is going to be an object like { tm: number, vm: number, s: [array of numbers] }
.
-
s
is our stack trace. to save space, it uses indices into the"code"
array instead of strings. Theaddr.map((n) => codes[n])
line turns these indices into familiar stack trace strings. -
vm
is some v8 state value, idk what it's for but we don't use it :smile: -
tm
is the timestamp. This is the number of nanoseconds since the start of the VM.
So, we need to run the ticks.push(stack.reverse())
statement only if the tm
value (which is expressed in nanoseconds) is larger than our desired delay (which is in milliseconds).
Does that clarify things a bit?
@goto-bus-stop - A bit yeah. I'll look through the code base and make sense of how this is working regarding your comment.
@goto-bus-stop - I have the parameter collectDelay filtering all the way down to my copy of 0x. https://github.com/DylanC/0x/tree/collect_timeout_feature
I also have some sudo code inside v8-log-to-ticks.js
. Just need to implement a timestamp value for the collectDelay timeout and filter the results based on this value.
@goto-bus-stop - Just investigating the CI failure. Its not due to linting but detect-port.test seems to be failing.
test/detect-port.test.js ........[eval]:1 throw new Error() ^ Error at [eval]:1:7 at Script.runInThisContext (vm.js:116:20) at Object.runInThisContext (vm.js:306:38) at Object.
([eval]-wrapper:9:26) at Module._compile (internal/modules/cjs/loader.js:955:30) at evalScript (internal/process/execution.js:80:25) at internal/main/eval_string.js:23:3 Target subprocess error, code: 1
Tests fail locally for me with --collectDelay is not a recognised flag
. CI installs 0x from npm and the version published on npm doesn't include that flag yet.
Normally we change the dependency in Flame's package.json to something like "0x": "your-github-username/0x#branchname"
while the upstream 0x PR is still open, so npm installs 0x from your github fork. Once a new 0x version is published, we change it back to that version before merging the Flame PR.
@goto-bus-stop - I have one error left to resolve for this change. In test/cmd-collect-delay.test.js:
test/detect-port.test.js ........[eval]:1 throw new Error() ^ Error at [eval]:1:7 at Script.runInThisContext (vm.js:122:20) at Object.runInThisContext (vm.js:329:38) at Object.
([eval]-wrapper:6:22) at Module._compile (internal/modules/cjs/loader.js:778:30) at evalScript (internal/bootstrap/node.js:589:27) at startup (internal/bootstrap/node.js:265:9) at bootstrapNodeJSCore (internal/bootstrap/node.js:622:3) Target subprocess error, code: 1
@DylanC that's actually expected output from a test that tests a failure case … would be good to silence that somehow :sweat_smile:
Failures can be a bit obscured on CI because of the lengthy TAP output. I usually Ctrl+F for "not ok", which TAP prints when a test fails. Here, it looks like delayOneSecond
is still found in the tree:
Timing on CI can be a real bother. Maybe the 300ms extra delay was still not enough… You could try adding a few more 100ms of wiggle room, if that doesn't work we may have to think of something smarter.
@goto-bus-stop - Not sure what happened but the CI seems to be stuck. 3 did pass after the last collect delay increase.
hmm yeah i'm looking at it too…seems like allsome of the other ones are still seeing the delayOneSecond
node :/
there are two .DS_Store
files in this PR still. could you get rid of them?
Some of the other tests became(?) flaky? Working on that in https://github.com/nearform/node-clinic-flame/pull/108