parlaylib icon indicating copy to clipboard operation
parlaylib copied to clipboard

Resolve race condition in scheduler (Issue #83)

Open darshand15 opened this issue 5 months ago • 1 comments

As part of my research work under the guidance of @shwestrick, we identified a race condition in the scheduler initialization, which we have elaborated in Issue #83. This PR aims to resolve this issue.

The proposed solution is as follows:

  • Within wait_for_work, first announce that the current worker is being put to sleep by decrementing num_awake_workers
  • However, before actually sleeping through atomic_wait, perform one further round of steals:
    • If a job is found: increment num_awake_workers, and start working on the job without atomic_wait
    • If no job is found: perform atomic_wait

Execution order of loading wake_up_counter and decrementing num_awake_workers

  • While implementing the above solution as part of the wait_for_work function, we observed that the execution order of loading wake_up_counter and decrementing num_awake_workers can impact the correctness of the scheduler logic
  • The current implementation, as part of parlaylib, decrements num_awake_workers and then loads the wake_up_counter
  • However, our analysis showed that loading the wake_up_counter and then decrementing num_awake_workers seems to be the correct order
  • Load_decrement_order_issue.pdf demonstrates a few scenarios, including a counter-example to highlight the requirement for this ordering
  • Informal_Proof_load_decrement_order.pdf details an informal proof to show why this proposed ordering is correct

Verification of Proposed Solution:

  • Functional Testing:

    • It was verified that the proposed solution passes all the parlaylib unit test cases
    • The initial motivational example that triggered the race condition was experimented with to verify that the proposed solution in fact resolves the issue
    • The example benchmark is the one mentioned in Issue #83
    • The results generated with 10000 as the command line parameter and additional instrumentation code to measure the working, stealing and sleeping times of each thread in the scheduler, are as follows:
    Original Implementation:
    End to End Wall Clock Time Taken: 30180438714 nanoseconds (1 thread)
    End to End Wall Clock Time Taken: 30182756851 nanoseconds (2 threads)
    
    Details for execution with 2 threads:
    Thread 0: 22633538008, 0, 0 (Working, Stealing, Sleeping Times in ns)
    Thread 1: 0, 358, 22632559285 (Working, Stealing, Sleeping Times in ns)
    
    =====================================================================
    Scheduler Fix Implementation:
    End to End Wall Clock Time Taken: 30191433263 nanoseconds (1 thread)
    End to End Wall Clock Time Taken: 22649582353 nanoseconds (2 threads)
    
    Details for execution with 2 threads:
    Thread 0: 15086054897, 7008443, 5285508 (Working, Stealing, Sleeping Times in ns)
    Thread 1: 7551540446, 6056362, 7539810894 (Working, Stealing, Sleeping Times in ns)
    
    • It can be clearly seen that the race condition has been resolved
    • It was also attempted to replicate this race condition by adding the benchmark to the google benchmark framework of parlaylib. However, the race condition was not triggered and hence these commits have not been included in the PR.
  • Performance Testing:

    • This was done by executing the parlaylib benchmark suite and comparing the results for the original implementation and the one with the scheduler fix
    • The detailed results, along with the % Time Deltas and the Time Ratios (Scheduler Fix Time/Original Time), have been captured in Benchmark_Results.pdf
    • The Geometric Mean of the Real Time Ratios was calculated to be 0.980, denoting that there was an average 2% improvement in Real Time with the fix
    • The Geometric Mean of the CPU Time Ratios was calculated to be 0.981, denoting that there was an average 1.9% improvement in CPU Time with the fix

darshand15 avatar Jul 21 '25 22:07 darshand15