isolated-vm icon indicating copy to clipboard operation
isolated-vm copied to clipboard

cpuTime appears to decrease

Open SomeHats opened this issue 3 years ago • 5 comments

Hello! Thanks for all your awesome work on isolated-vm - it's super cool.

I've noticed some cases where cpuTime appears to decrease between accesses. I notice in the docs there's this disclaimer:

Note that CPU time may vary drastically if there is contention for the CPU. This could occur if other processes are trying to do work, or if you have more than require('os').cpus().length isolates currently doing work in the same nodejs process.

I don't fully understand exactly how CPU time is measured, but does 'vary drastically' mean that we could in some cases expect to see the returned time reduce by 100s of ms between reads of cpuMs? Do you know if there's anything we can do to increase the accuracy of these measurements?

Thanks!

SomeHats avatar Oct 15 '20 13:10 SomeHats

Is this on Linux, or another operating system?

laverdet avatar Oct 15 '20 17:10 laverdet

Yes, it’s on Linux - Amazon Linux in an AWS Lambda function specifically.

SomeHats avatar Oct 15 '20 22:10 SomeHats

cpuTime on non-Linux systems is calculated by the difference in wall time between work start and work end in an isolate. This is, of course, strictly larger than the actual CPU time allotted by the OS. On Linux there's an easy way to measure actual CPU time, so that's special cased. This is in place because in aggregate it provides a precise measurement of total resources used by an isolate. So you could use this variable to bill your clients / users for the CPU that they used.

The problem is that you can only query the CPU clock from the current thread. So if you query cpuTime while that isolate is currently running work it will estimate the CPU time using a wall time delta. This would account for cpuTime ticking up a little faster than it should, and then reseting lower when it has a chance to make an accurate measurement.

There's an improvement that could be made here where querying cpuTime from the current isolate would return an accurate result. I'll leave this open to make a note to implement that in the future.

laverdet avatar Oct 16 '20 05:10 laverdet

I see, thank you so much for the detailed response. Just to confirm my understanding:

  • When not on Linux, the value returned from cpuTime will be over-estimated from the actual resources used, but will return consistent values as it always uses the same approach to measurement
  • When on Linux, the value returned from cpuTime is often much more accurate, but switches to a different measurement approach depending on when it's called (ie if the isolate is currently running work), where the fallback measurement approach can return significantly different values.

Does that sound accurate to you? How does the non-linux measurement approach differ (if at all) from the wall-clock fallback used by linux? My assumption is that if we were to disable USE_CLOCK_THREAD_CPUTIME_ID on linux we'd see higher cpuTime values similar the ones we get when accessing it whilst the isolate is running work, but those values would increase in a more predictable way.

What would the approach be for your improvement for querying accurate cpuTime from the current isolate? I guess you'd need to interrupt the isolate that's doing work in order to make a measurement.

SomeHats avatar Oct 16 '20 13:10 SomeHats

Your understanding seems correct to me.

The improvement I mentioned would only be relevant to accessing cpuTime from the current isolate. So you do await context.global.set('isolate', isolate) and then from within that isolate you could observe your own cpu time with isolate.cpuTime. I'm not sure if that would help you or not, but it's something that could be done that would always be accurate.

laverdet avatar Oct 17 '20 18:10 laverdet