isolated-vm
isolated-vm copied to clipboard
cpuTime appears to decrease
Hello! Thanks for all your awesome work on isolated-vm
- it's super cool.
I've noticed some cases where cpuTime
appears to decrease between accesses. I notice in the docs there's this disclaimer:
Note that CPU time may vary drastically if there is contention for the CPU. This could occur if other processes are trying to do work, or if you have more than require('os').cpus().length isolates currently doing work in the same nodejs process.
I don't fully understand exactly how CPU time is measured, but does 'vary drastically' mean that we could in some cases expect to see the returned time reduce by 100s of ms between reads of cpuMs
? Do you know if there's anything we can do to increase the accuracy of these measurements?
Thanks!
Is this on Linux, or another operating system?
Yes, it’s on Linux - Amazon Linux in an AWS Lambda function specifically.
cpuTime
on non-Linux systems is calculated by the difference in wall time between work start and work end in an isolate. This is, of course, strictly larger than the actual CPU time allotted by the OS. On Linux there's an easy way to measure actual CPU time, so that's special cased. This is in place because in aggregate it provides a precise measurement of total resources used by an isolate. So you could use this variable to bill your clients / users for the CPU that they used.
The problem is that you can only query the CPU clock from the current thread. So if you query cpuTime
while that isolate is currently running work it will estimate the CPU time using a wall time delta. This would account for cpuTime
ticking up a little faster than it should, and then reseting lower when it has a chance to make an accurate measurement.
There's an improvement that could be made here where querying cpuTime
from the current isolate would return an accurate result. I'll leave this open to make a note to implement that in the future.
I see, thank you so much for the detailed response. Just to confirm my understanding:
- When not on Linux, the value returned from
cpuTime
will be over-estimated from the actual resources used, but will return consistent values as it always uses the same approach to measurement - When on Linux, the value returned from
cpuTime
is often much more accurate, but switches to a different measurement approach depending on when it's called (ie if the isolate is currently running work), where the fallback measurement approach can return significantly different values.
Does that sound accurate to you? How does the non-linux measurement approach differ (if at all) from the wall-clock fallback used by linux? My assumption is that if we were to disable USE_CLOCK_THREAD_CPUTIME_ID
on linux we'd see higher cpuTime
values similar the ones we get when accessing it whilst the isolate is running work, but those values would increase in a more predictable way.
What would the approach be for your improvement for querying accurate cpuTime
from the current isolate? I guess you'd need to interrupt the isolate that's doing work in order to make a measurement.
Your understanding seems correct to me.
The improvement I mentioned would only be relevant to accessing cpuTime
from the current isolate. So you do await context.global.set('isolate', isolate)
and then from within that isolate you could observe your own cpu time with isolate.cpuTime
. I'm not sure if that would help you or not, but it's something that could be done that would always be accurate.