JCoz icon indicating copy to clipboard operation
JCoz copied to clipboard

JCoz sometimes reports strange delays in experiment results

Open AlexVanGogen opened this issue 5 years ago • 1 comments

The total delay in experiment result can be obscure sometimes. For example, there can be non-zero delays on baseline; once there was a delay that was greater than experiment duration at all.

It looks like in that case no one signal that resets thread-local delays is handled by thread. Such thread might handle the last signal received during experiment for a very long time, so that even the next experiment has time to be prepared. This causes the next experiment to run with stale thread-local delays, which affects the global delay, and, if some thread yet has nullified local delay, then it will fall asleep in signal handler although it isn't supposed to.

AlexVanGogen avatar Aug 18 '20 22:08 AlexVanGogen

I think the solution here is to use a thread barrier between signaled user threads and the agent thread running an experiment.

One simple thing we can do here is add an atomic that the agent thread initializes to 0 before signaling, has each user thread atomically increment before returning from a signal handler, and then waits on before returning from signal_user_threads.

Byte-Lab avatar Aug 19 '20 14:08 Byte-Lab