core-os-riscv
core-os-riscv copied to clipboard
Re-implement process scheduling system
After investigating #8 , I found the issue is mainly due to my re-design of xv6 process scheduling system.
xv6's scheduling system
xv6 stores processes in a global array. Each element of this array holds a mutex of itself. The process just stays at its place, it won't be moved back and forth.
My implementation
There's also a process array called PROCS_POOL
in core-os. However, in order to meet the ownership requirements in Rust, I made some changes to the scheduling system.
If a process is to be scheduled on hart, scheduler will swap process object in PROCS_POOL
into CPU object. In this way, CPU has full ownership of the process object. That's how I address that ownership issue in early stage. Therefore, there's no need to add a mutex to every process as only current hart takes ownership of the process object.
However, I always challenges myself of this implementation. That's because that:
- Traps: exceptions and interrupts will always change control flow. After implementing timer-based scheduler (aka. preemptive scheduling), the kernel might be interrupted at any time. For example, when context is being switched. If a trap happens at this time, there'll be problems when checking whether there's a running process on CPU.
- Sleep Locks (its equivalent is conditional variable in pthread): As is mentioned in #2 , I proposed a global lock to indicate if there're any kernel thread in the process of being put back into pool.
- Frequent kernel panic in #8 : After 5 hours of debugging, I figured out that this issue is caused by traps happening during scheduling. If the hart is running scheduler thread and a timer interrupt happens after proc is swapped into CPU but scheduler hasn't called
swtch
yet, the kernel will assume that there's a process running on CPU, and tries calling scheduler thread. After a few failed attempts, I gave up on current scheduling system.
After all, I'll implement new scheduling system as described in xv6, and meanwhile adapt async-style kernel thread in #1 . This issue will be resolved in next big refactor and milestone of core-os. Now I just take a break and focus on my course projects.
Now I just check if scheduler thread is running with context
being zero. As there's no locking mechanisms, kernel panic may occur in very low probability. It just works.