libdragon icon indicating copy to clipboard operation
libdragon copied to clipboard

First kernel prototype

Open rasky opened this issue 3 years ago • 16 comments

This PR introduces a multithreading kernel to libdragon. Having threads is a very good way to make good use of the limited resources available on the console as it makes it easier to switch away and do something while a DMA is in progress or while the RSP is crunching. In theory, everything can be done without threads, but in practice we posit that most homebrew developers will be better served with a threading module.

In my upcoming audio library, threading Is basically necessary as processing audio requires switching between CPU and RSP several times, and doing that only with interrupts gets complicated fast. On the contrary, with threads, it's very easy to keep audio processing in background.

This PR just introduces the kernel. It doesn't attempt to revisit all existing libdragon APIs to do some thread-aware yield instead of spin-looping. That's something for another day. Also, we will have to discuss whether it makes sense for APIs to automatically use threads if available and fallback to spinloops, or it's better to have a different set of APIs. A good example to reason about is dma_read or even wait_ms.

Description of the kernel module

This module implements a hybrid cooperative/preemptive multi-threaded kernel for parallel execution of code.

The kernel is "hard real-time": any thread with high priority that is ready (not blocked) will always have priority over lower priority threads. This is required to allow threads to implement operations that require low latency, for instance preparing audio when the AI interrupt fires.

The kernel is not fully preemptive; in particular, there is no timer interrupt that switches among ready threads at a fixed interval. This is not deemed necessary as applications will usually have a low number threads that are mostly blocked waiting for specific events like hardware interrupts or background activities like RSP ucode.

An event, represented by a kevent_t instance, is a signal that can be broadcasted to many different threads. A thread can use kthread_wait_event to wait for a specific event to fire. The kernel library includes a set of events bound to MI peripheral interrupts, so that it's possible for threads to wait for specific interrupts and act after them.

Typically, applications will keep the main thread at low priority for the main business logic, and will spawn a few threads to handle high-priority / low-latency responses to hardware events. Libdragon itself offers libraries that need threading to provide a easy-to-use API.

A context-switch between threads can happen in the following situations:

  • Preemptively: when a hardware interrupt generates a event that wakes up a thread (see kevent). This is the most common scenario: an event connected to an interrupt is fired, and a higher priority thread waiting for the event is woken up.

  • Cooperatively: when a thread explicitly calls a blocking function like kthread_sleep or kthread_wait_event to wait for an event. The kernel will put the current thread to sleep and schedule the highest-priority thread among the ready ones, including threads that have a priority lower than the one that goes to sleep. When the event is fired, the lower-priority thread will be preempted to give back control to the thread that just woke up.

  • Cooperatively: when a thread explicitly calls kthread_yield to leave the CPU to other threads that might be ready at the same time. Notice that, given the hard real-time guarantee, this only makes sense if there are multiple ready threads at the same priority level, as calling kthread_yield will not schedule a lower-priority thread. It is normally not necessary to call thread_yield.

In addition to threads (kthread_t) and events (kevent_t), the kernel library offers a message passing communication primitive called "message box" (kmbox_t). A mailbox is a circular buffer of messages called "mails"; any number of threads can try to send mails into a mailbox, and any number of threads can try to read mails from a mailbox. Once a thread receives a mail from a mailbox, the mail is removed and other threads will not see it anymore.

Mailboxes can be useful as communicative primitives among threads (eg: to send tasks to other threads), or a synchronization primitive to wait for multiple events (see kmbox_attach_event).

rasky avatar Jun 29 '21 22:06 rasky

Incredible stuff Rasky. My initial and superficial feedback is just about naming consistency, internally and with libdragon.
Within kernal.h, the same concept is referred to as a message box, a mail box and as kmbox. I think a single name should be applied throughout. Libdragon usually spells things out in full, so personally I like kernel_mailbox_t etc.

The k prefix also seems a bit inconsistent with libdragon's naming, but I'm undecided on whether every function here being called kernel_blah_blah_blah is a better solution.

kmbox_empty reads a bit ambiguously to me. In my projects these would be is_empty and is_full, though the only places libdragon uses that convention are from my own PRs, so I don't have much to justify myself there.

joeldipops avatar Jun 30 '21 05:06 joeldipops

I love this so much! kernel.h is a lot easier to read and parse through than libultra's OSMesgQueue documentation.

I've tried to look for things that might make the API a little easier to handle for a programmer new to low-level C programming. Stuff like kthread_new returning NULL if we're out of memory, or optional thread names. The N64Brew Game Jam had a variety of developers with different experience levels, and I'd love to advocate for ease-of-use on their behalf. If this is able to empower someone to make something new for the Nintendo 64, I'm all for it.

danbolt avatar Jun 30 '21 18:06 danbolt

Within kernal.h, the same concept is referred to as a message box, a mail box and as kmbox. I think a single name should be applied throughout. Libdragon usually spells things out in full, so personally I like kernel_mailbox_t etc.

Any suggestion on whether it's better to use "message box" or "mail box" in the first place? I'd go with mailbox just because it's shorter.

The k prefix also seems a bit inconsistent with libdragon's naming, but I'm undecided on whether every function here being called kernel_blah_blah_blah is a better solution.

Yeah that's the reason why I haven't put the full kernel_ in front of everything. If we don't like the k, I guess a reasonable alternative is to drop the kernel dependency altogether and simply have thread_new, etc.

kmbox_empty reads a bit ambiguously to me. In my projects these would be is_empty and is_full, though the only places libdragon uses that convention are from my own PRs, so I don't have much to justify myself there.

No strong opinions here. In general I prefer shorter to longer, so I tend to err on the side of dropping is from booleans.

rasky avatar Jul 02 '21 15:07 rasky

I've tried to look for things that might make the API a little easier to handle for a programmer new to low-level C programming.

Absolutely! In the debugging library PR, I've started also adding asserts here and there in libdragon in situations that have bitten me in the past and in which debugging was hard (eg: calling dma_read with size 0 used to freeze the console, and it took me a couple of debugging days to find the bug in a complex code where the size was dynamically calculated depending on all kind of async events). This is also why I made sure that now we can actually call assert() in the first place and see the output on the screen (and the debug spew).

I'm all for helping programmers with debugging features especially in an embedded context where there are many hurdles.

In this PR, as you might have seen, I've added stack smashing protection. In my experience, the most common case of instability brought by threads in an embedded system is programmers overflowing the stacks without realizing them. So this kernel adds some cookies at the end of the stack and every time the stack is switched away, it verifies if the cookies were corrupted.

I've also played a bit with TLBs; it could be possible to map the stack using TLB so that every single memory access outside the allocated range would immediately trigger an exception. The main issue is that TLBs have only a few fixed sizes (for our purposes: 4K, 16K, 64K), so you would need to map a combination of those to cover the stack (for instance, for a 28K stack, one would need 1 16K TLB + 3 4K TLBs), and in general you would be limited to 4K increments at that point.

What do people think about this? Is it worth it, compared to what is already implemented here?

rasky avatar Jul 02 '21 16:07 rasky

Within kernal.h, the same concept is referred to as a message box, a mail box and as kmbox. I think a single name should be applied throughout. Libdragon usually spells things out in full, so personally I like kernel_mailbox_t etc.

Any suggestion on whether it's better to use "message box" or "mail box" in the first place? I'd go with mailbox just because it's shorter.

I removed all references to "message box", it's now called "mailbox" everywhere. The type name is still kmbox_t but I can be talked into kmailbox_t if people feel strongly.

rasky avatar Jul 05 '21 07:07 rasky

I am towards removing the k/kernel prefix too, but wont be something I will insist on. TLB idea looks really good, though I think we can implement it later on.

anacierdem avatar Jul 05 '21 13:07 anacierdem

Another idea: what about exposing a define to prevent the kernel checks? In case someone does not use it or provides their own alternative implementation.

anacierdem avatar Aug 01 '21 12:08 anacierdem

I haven't verified this yet (may need a separate test) but I suspect that a stack callback in a thread may not be able to work properly. Think about this code sample;

void func_th(void *arg) {
	volatile bool cb1_called = false;

	void cb1(int ovlf) {
		cb1_called = true;
	}

	timer_init();

	timer_link_t *t1 = new_timer(TICKS_FROM_MS(2), TF_ONE_SHOT, cb1);

        kthread_set_pri(NULL, 2);

	while (!cb1_called) {}

        // I think this will never be able to continue execution
}

int main() {
	kernel_init();

	kthread_set_pri(NULL, 5);
	kthread_new("test1", 2048, 10, func_th, NULL);

        while(1){}
}
  • func_th will get immediately scheduled and register a timer callback yielding itself
  • main thread starts the idle loop, we are in the main context now
  • timer expires, stores registers in the main stack
  • then it calls cb1 before we schedule the secondary func_th thread
  • the callback potentially clobbers a main thread register/memory (register are safe though)
  • func_th is scheduled back, cb1_called was not set and it hangs.

I may be totally off with this but seems like it needs a test of its own. If this is not working as expected, we need to declare global functions incompatible as threads.

anacierdem avatar Aug 16 '21 08:08 anacierdem

GNU stack function extension seems to be smart enough to keep sp and arrange the trampoline for any parameters. Still might be a good idea to add tests at some point.

anacierdem avatar Aug 16 '21 08:08 anacierdem

Another idea: what about exposing a define to prevent the kernel checks? In case someone does not use it or provides their own alternative implementation.

Done.

rasky avatar Aug 19 '21 08:08 rasky

Another idea: what about exposing a define to prevent the kernel checks? In case someone does not use it or provides their own alternative implementation.

Done.

Actually I was trying to reference if (__kernel)s but the ability to disable cookie checks is also a good idea :)

anacierdem avatar Aug 19 '21 13:08 anacierdem

Actually I was trying to reference if (__kernel)s but the ability to disable cookie checks is also a good idea :)

Oh :) For that, I'd wait to see how things shape up. The hope is that there shouldn't be ifs in user code. For instance, code would call dma_read and would get the correct behavior whether it's using the kernel or not. Obviously this is making things more complicated for us (it would be easier to simply mandate kernel usage, like Nintendo did), but it might be worthwhile for a while, as we're exposing the real benefits of the kernel.

rasky avatar Aug 19 '21 21:08 rasky

I've realized that this PR is missing the part about making newlib safe for threading.

I think there are 3 things to do:

  1. keep one copy of the reentrancy structure (struct reent) in the TCB, and switch the global reent pointer (_impure_ptr) in the scheduler any time a thread switches. This allows each thread to have its own errno, and in general its own global context as required by libc. This is easy and has ~overhead, it just adds ~2K of RAM per each thread. BTW there is also a __REENT_SMALL define that we can investigate while compiling newlib to reduce this RAM overhead per thread.
  2. provide a reentrant-safe version of the syscalls in system.c, instead of the non-reentrant versions we're providing now. This is something we could do always, even if the kernel is not used, because it doesn't give overhead (in fact, it removes useless wrappers that newlib has to create for each syscall). We also need to rebuild newlib with macro REENTRANT_SYSCALLS_PROVIDED.
  3. implement locks as required by sys/lock.h. This would be easy if I had implemented mutex/semaphore in the kernel... which I haven't. I think for the audio library I will end up needing a mutex anyway, so I might as well work on it.

If somebody wants to help, I think 2 is an easy pick. It's a semi-mechanical change.

rasky avatar Aug 26 '21 13:08 rasky

Status update: this PR would be ready to go. I'm going to test it on a real world scenario (the audio library -- creating a audio thread) before actually merging this, so that we can verify that the API is sound and working.

As for the newlib reentrancy issue, this can be fixed in a followup.

rasky avatar Oct 02 '21 20:10 rasky

I just have one thing in my mind:

  • It'd be nice if there was a way to disable preemption when we need so that we can use non-entrant things in threads. It's been a long time and I don't remember the code ATM. If just wrapping with disable/enable interrupts will do it let's merge this once you are content. Or is there a better way?

anacierdem avatar Oct 03 '21 12:10 anacierdem

For now you can disable interrupts for that. But to do the audio thread support, I'll add proper preemption disable, which is actually quite easy to implement (i think).

rasky avatar Oct 03 '21 13:10 rasky