popcorn-kernel icon indicating copy to clipboard operation
popcorn-kernel copied to clipboard

Initiate migration from kernel/user-space scheduler

Open beowulf opened this issue 7 years ago • 4 comments

Currently, migration is initiated by system call. It would be interesting to support the scheduler-initiated migration.

beowulf avatar Aug 10 '17 04:08 beowulf

@beowulf Are you referring to popcorn_propose_migration syscall (and removing it)?

@mohamed-karaoui How do you currently initiate migration, if not via the syscall? I have threads open a Unix domain socket to let the scheduler know that they are at a migration point; the scheduler then responds with a node ID, and migration proceeds immediately (if at all).

acarno avatar Feb 17 '18 01:02 acarno

I generally put a syscall directly into the application code. More recently, I changed this to use the environnement variables to migrate at the initialisation of the application, just as main() is called.

But it will be nice to be able initiate migration using a POSIX signal (I think Rob mentioned this once). The signal will be catched by the (migration?) library and will set a global variables. This variable will be checked by the migration library at each function call rather than using a syscall to know if migration is requested.

I tried this implementation, but since it requires TLS (we need a global variable for each thread) is not stable, a gave up. I should have some initial code somewhere, though.

mohamed-karaoui avatar Feb 17 '18 02:02 mohamed-karaoui

@acarno @mohamed-karaoui Sorry guys, I saw this message now. This "enhancement" is for providing a more native way to initiate the migration on homogeneous setting. Any thread should be migratable at any point (there is no non-equivalent points), which means the in-kernel scheduler can initiate the migration at any time. I think this would be the optimal form of the scheduler.

Regards the signal-driven migration, signals are not properly handled across node in the current implementation. Also, as far as I know, the signal is defined at the process granularity, so if you send a signal to a thread, the signal might be caught by another thread, isn't it? Yes, you can define a mask for the signal, but every thread that should be migrated must not mask the signal for the "migration signal", which means we might not correctly steer the signal to the target thread. (If I am wrong, please correct me.)

beowulf avatar Feb 28 '18 05:02 beowulf

Hmm -- I suppose for homogeneous migration, you might be able to get away with migrating at any location. Maybe you could deliver a signal to a thread, then (ab)use the ucontext_t field in your signal handler to get the stack/register information (see man 3 getcontext). I guess you wouldn't need stack transformation (at least not directly) -- it's the same architecture, no? This wouldn't necessarily apply with fancier instructions (e.g., SIMD instructions).

Linux provides the (Linux-specific) tgkill system call, which claims to deliver a signal to a particular thread in a particular thread group. I seem to remember it having weird behavior... though that could just be my fuzzy recollection. You would still have the problem of delivering the signal to every thread in the process (across nodes) -- that might require some hacking.

acarno avatar Feb 28 '18 15:02 acarno