core-os-riscv icon indicating copy to clipboard operation
core-os-riscv copied to clipboard

Re-implement process scheduling system

Open skyzh opened this issue 4 years ago • 0 comments

After investigating #8 , I found the issue is mainly due to my re-design of xv6 process scheduling system.

xv6's scheduling system

xv6 stores processes in a global array. Each element of this array holds a mutex of itself. The process just stays at its place, it won't be moved back and forth.

My implementation

There's also a process array called PROCS_POOL in core-os. However, in order to meet the ownership requirements in Rust, I made some changes to the scheduling system.

If a process is to be scheduled on hart, scheduler will swap process object in PROCS_POOL into CPU object. In this way, CPU has full ownership of the process object. That's how I address that ownership issue in early stage. Therefore, there's no need to add a mutex to every process as only current hart takes ownership of the process object.

However, I always challenges myself of this implementation. That's because that:

  • Traps: exceptions and interrupts will always change control flow. After implementing timer-based scheduler (aka. preemptive scheduling), the kernel might be interrupted at any time. For example, when context is being switched. If a trap happens at this time, there'll be problems when checking whether there's a running process on CPU.
  • Sleep Locks (its equivalent is conditional variable in pthread): As is mentioned in #2 , I proposed a global lock to indicate if there're any kernel thread in the process of being put back into pool.
  • Frequent kernel panic in #8 : After 5 hours of debugging, I figured out that this issue is caused by traps happening during scheduling. If the hart is running scheduler thread and a timer interrupt happens after proc is swapped into CPU but scheduler hasn't called swtch yet, the kernel will assume that there's a process running on CPU, and tries calling scheduler thread. After a few failed attempts, I gave up on current scheduling system.

After all, I'll implement new scheduling system as described in xv6, and meanwhile adapt async-style kernel thread in #1 . This issue will be resolved in next big refactor and milestone of core-os. Now I just take a break and focus on my course projects.

Now I just check if scheduler thread is running with context being zero. As there's no locking mechanisms, kernel panic may occur in very low probability. It just works.

skyzh avatar Feb 28 '20 15:02 skyzh