gramine
gramine copied to clipboard
RFC: interruptible POSIX locks
I'm trying to fix recently added POSIX locks (https://github.com/gramineproject/graphene/pull/2481) so that you can interrupt them, but I ran into a problem trying to do it with current IPC. I don't want to jump and change too much before I learn your opinion.
@boryspoplawski @mkow @dimakuv
The problem
Graphene's current implementation of POSIX locks (fcntl
) do not support interrupting the request:
void handler(int signum) {
printf("got signal %d\n", signum);
}
signal(SIGALRM, &handler);
fcntl(fd, F_SETLK, ...);
pid_t pid = fork();
if (pid == 0) {
// child process:
// trigger SIGALRM after 10 seconds
alarm(10);
// this should be interrupted:
fcntl(fd, F_SETLKW, ...);
...
}
The fcntl(F_SETLKW)
in the child process should wait for 10 seconds and then fail with EINTR (after the process receives SIGALRM). Under Graphene, it hangs indefinitely.
So far, I've seen this checked by LTP (test fcntl16
), and I'm pretty sure it's used by stress-ng
(see gramineproject/graphene#2510). I'm not sure how much it's used in actual applications, but interrupting the fcntl
call seems to be the only sensible way of trying to take a lock with timeout.
How to interrupt an operation over IPC?
Interrupting the lock operation is easy in the single-process version. However, if we are in a remote process, the operation is requested over IPC, and it's not that easy to achieve the same semantics.
The current IPC implementation does something like this:
lock_msg = { POSIX_LOCK_SET, .wait = true, ... };
// this waits for response, and retries on EINTR
ipc_send_message_and_get_response(lock_msg, &result);
return result;
As long as ipc_send_message_and_get_response
keeps retrying, we have no way to return earlier. I've been thinking about some solutions:
-
Stop waiting?: You could be tempted to modify the
ipc_send_message_and_get_response()
function to not wait (perhaps with someretry = false
parameter). However, this does not actually cancel the operation, so the lock will be taken eventually. This is wrong - we want theF_SETLKW
operation to fail without side effects. -
Stop waiting and issue another call? We could send another IPC message to cancel the pending operation. Keep in mind that the operation might get finished before the "cancel" message is delivered, so we need to learn what the actual result is.
lock_msg = { POSIX_LOCK_SET, .wait = true, ... }; // this returns -EINTR if the wait is interrupted ret = ipc_send_message_and_get_response(lock_msg, /*retry=*/false, &result); if (ret == -EINTR) { // Cancel, and get result. `result` will be -EINTR if we succesfully canceled the operation, // or something else (0, -ENOMEM, ...) if the operation finished in the meantime cancel_msg = { POSIX_LOCK_CANCEL, msg.seq, ... }; ipc_send_message_and_get_response(cancel_msg, /*retry=*/true, &result); } return result;
The problem with that is that if the operation does get finished in the meantime, the remote side has to keep storing the result in case we receive
POSIX_LOCK_CANCEL
. So we would need some way of cleaning up that result.Also, if the response to
POSIX_LOCK_SET
is sent in the meantime, we are going to just drop it. -
Keep waiting for the original response? Ideally, we want to always get the response for the original message, but sometimes we want to trigger the response to come early. So we could register a wait, but on EINTR, send additional message in the meantime, then return to the previous wait:
lock_msg = { POSIX_LOCK_SET, .wait = true, ... }; ipc_send_message_and_register_wait(lock_msg); ret = wait_for_response(lock_msg, /*retry=*/false); if (ret == -EINTR) { // Cancel. That will cause response to `lock_msg` to be sent without further delay // (if it hasn't been sent already). cancel_msg = { POSIX_LOCK_CANCEL, msg.seq, ... }; ipc_send_message(cancel_msg); // Wait for response to the original message. `result` will be -EINTR if we canceled the operation, // or something else (0, -ENOMEM...) if the operation finished in the meantime. wait_for_response(lock_msg, /*retry=*/true, &result); } return result;
I like that, but it makes the IPC API more complicated: instead of one function that does everything, there are separate stages (first send a message and register a wait, then wait for response). That means managing some state between calls (a "waiter" registered for the IPC worker) that currently is local to
ipc_send_message_and_get_response
. -
Add an
on_interrupt
callback? The above could be simplified by keeping theipc_send_message_and_get_response
as is, but making it perform some additional action when the wait is interrupted.int send_cancel(struct shim_ipc_message* msg) { cancel_msg = { POSIX_LOCK_CANCEL, msg->seq, ... }; return ipc_send_message(cancel_msg); } lock_msg = { POSIX_LOCK_SET, .wait = true, ... }; // this calls `on_interrupt` when the wait is interrupted, then keeps on waiting // (or fails, if `on_interrupt` returned failure) ipc_send_message_and_get_response(lock_msg, /*on_interrupt=*/&send_cancel, &result); return result;
This would certainly be easiest to use, but it's a pretty specific use-case. I think it's worth it only if we expect to use it again soon.
So I think either 3 or 4 are viable. But maybe I'm missing something and there is an easier way?
I prefer option 4.
Option 3 sounds like an overkill to fix this issue (and similar "amend/cancel pending operation" issues). Splitting our nice ipc_send_message_and_get_response()
into two phases, with some state maintained in-between, sounds complex.
By the way, what is the struct shim_ipc_message* msg
in your send_cancel
callback? Why is this argument needed? And what is msg->seq
, how does it help?
Option 3 sounds like an overkill to fix this issue (and similar "amend/cancel pending operation" issues). Splitting our nice ipc_send_message_and_get_response() into two phases, with some state maintained in-between, sounds complex.
I agree; on the other hand, option 4 sounds over-specific. But since it seems cheap, maybe it doesn't hurt to try it out. I'll try finishing my branch with option 4 and we'll see how it looks.
By the way, what is the
struct shim_ipc_message* msg
in yoursend_cancel
callback? Why is this argument needed? And what ismsg->seq
, how does it help?
It was supposed to be the original message (lock_msg
). I need seq
which is a message identifier (currently added by ipc_send_message_and_get_response
), so that the "cancel" message says which operation to cancel. Actually, I'd probably like to use a generic void* arg
parameter in the callback instead, to make it a bit more flexible; then lock_msg
will be passed explicitly.
How does 4. work with both messages? Will we wait for responses for both? (original lock + cancel msg) Overall it seems like an ok solution for now.
-
seq
wasn't mean to be a public interface, it was intended to stay inside IPC implementation (i.e. not meant to be used as some other objects ID, which I guess 4 proposes - useseq
as lock request ID). Using it outside and sending two messages with sameseq
may break some core assumptions. -
ipc_send_message_and_get_response
is not interruptible by design. It's meant to be an internal mechanism and all public IPC interfaces are meant to be building blocks for other subsystems. Response messages are supposed to come ~soon, i.e. they are not supposed to do sleeps/waits in the message recipient (destination IPC worker). Also doing sleeps with functions other thanthread_wait
is not signal-safe/fully-signal-aware.
I would say you cannot get away from using something like:
struct X x;
add_to_global_list(&x);
ipc_send_msg(get_lock);
thread_wait();
if (!check_if_condition_happend(&x, is_lock_taken, atomic_wrt_callbacks=True)) {
ipc_send_msg(cancel_get_lock);
}
remove_from_global_list(&x);
return check_if_condition_happend(&x, is_lock_taken, atomic_wrt_callbacks=False);
and have some callbacks on lock_get response messages. This might sound a bit like reimplementing responses, but this seems to be a pretty special case. Also this sounds like something that might need to be built-in sync server/client code?
OK, looks like no easy win here. On the other hand, reimplementing responses doesn't seem so bad: an IPC abstraction for "long-term operations" might be exactly what I need. I'll think about it more.
@pwmarcz It's quite possible that we need to add some more advanced IPC primitives than we have right now.