gramine RFC: interruptible POSIX locks

I'm trying to fix recently added POSIX locks (https://github.com/gramineproject/graphene/pull/2481) so that you can interrupt them, but I ran into a problem trying to do it with current IPC. I don't want to jump and change too much before I learn your opinion.

@boryspoplawski @mkow @dimakuv

The problem

Graphene's current implementation of POSIX locks (fcntl) do not support interrupting the request:

void handler(int signum) {
    printf("got signal %d\n", signum);
}
signal(SIGALRM, &handler);

fcntl(fd, F_SETLK, ...);
pid_t pid = fork();
if (pid == 0) {
    // child process:

    // trigger SIGALRM after 10 seconds
    alarm(10);

    // this should be interrupted:
    fcntl(fd, F_SETLKW, ...);
    ...
}

The fcntl(F_SETLKW) in the child process should wait for 10 seconds and then fail with EINTR (after the process receives SIGALRM). Under Graphene, it hangs indefinitely.

So far, I've seen this checked by LTP (test fcntl16), and I'm pretty sure it's used by stress-ng (see gramineproject/graphene#2510). I'm not sure how much it's used in actual applications, but interrupting the fcntl call seems to be the only sensible way of trying to take a lock with timeout.

How to interrupt an operation over IPC?

Interrupting the lock operation is easy in the single-process version. However, if we are in a remote process, the operation is requested over IPC, and it's not that easy to achieve the same semantics.

The current IPC implementation does something like this:

lock_msg = { POSIX_LOCK_SET, .wait = true, ... };

// this waits for response, and retries on EINTR
ipc_send_message_and_get_response(lock_msg, &result);
return result;

As long as ipc_send_message_and_get_response keeps retrying, we have no way to return earlier. I've been thinking about some solutions:

Stop waiting?: You could be tempted to modify the ipc_send_message_and_get_response() function to not wait (perhaps with some retry = false parameter). However, this does not actually cancel the operation, so the lock will be taken eventually. This is wrong - we want the F_SETLKW operation to fail without side effects.
Stop waiting and issue another call? We could send another IPC message to cancel the pending operation. Keep in mind that the operation might get finished before the "cancel" message is delivered, so we need to learn what the actual result is.
```
lock_msg = { POSIX_LOCK_SET, .wait = true, ... };

// this returns -EINTR if the wait is interrupted
ret = ipc_send_message_and_get_response(lock_msg, /*retry=*/false, &result);

if (ret == -EINTR) {
    // Cancel, and get result. `result` will be -EINTR if we succesfully canceled the operation, 
    // or something else (0, -ENOMEM, ...) if the operation finished in the meantime
    cancel_msg = { POSIX_LOCK_CANCEL, msg.seq, ... };
    ipc_send_message_and_get_response(cancel_msg, /*retry=*/true, &result);
}
return result;
```
The problem with that is that if the operation does get finished in the meantime, the remote side has to keep storing the result in case we receive POSIX_LOCK_CANCEL. So we would need some way of cleaning up that result.

Also, if the response to POSIX_LOCK_SET is sent in the meantime, we are going to just drop it.

Keep waiting for the original response? Ideally, we want to always get the response for the original message, but sometimes we want to trigger the response to come early. So we could register a wait, but on EINTR, send additional message in the meantime, then return to the previous wait:

lock_msg = { POSIX_LOCK_SET, .wait = true, ... };

ipc_send_message_and_register_wait(lock_msg);
ret = wait_for_response(lock_msg, /*retry=*/false);
if (ret == -EINTR) {
     // Cancel. That will cause response to `lock_msg` to be sent without further delay
     // (if it hasn't been sent already).
     cancel_msg = { POSIX_LOCK_CANCEL, msg.seq, ... };
     ipc_send_message(cancel_msg);

     // Wait for response to the original message. `result` will be -EINTR if we canceled the operation,
     // or something else (0, -ENOMEM...) if the operation finished in the meantime.
     wait_for_response(lock_msg, /*retry=*/true, &result);
}
return result;

I like that, but it makes the IPC API more complicated: instead of one function that does everything, there are separate stages (first send a message and register a wait, then wait for response). That means managing some state between calls (a "waiter" registered for the IPC worker) that currently is local to ipc_send_message_and_get_response.

Add an on_interrupt callback? The above could be simplified by keeping the ipc_send_message_and_get_response as is, but making it perform some additional action when the wait is interrupted.

int send_cancel(struct shim_ipc_message* msg) {
     cancel_msg = { POSIX_LOCK_CANCEL, msg->seq, ... };
     return ipc_send_message(cancel_msg);        
}

lock_msg = { POSIX_LOCK_SET, .wait = true, ... };

// this calls `on_interrupt` when the wait is interrupted, then keeps on waiting
// (or fails, if `on_interrupt` returned failure)
ipc_send_message_and_get_response(lock_msg, /*on_interrupt=*/&send_cancel, &result);
return result;

This would certainly be easiest to use, but it's a pretty specific use-case. I think it's worth it only if we expect to use it again soon.

So I think either 3 or 4 are viable. But maybe I'm missing something and there is an easier way?

Jul 07 '21 13:07 pwmarcz

I prefer option 4.

Option 3 sounds like an overkill to fix this issue (and similar "amend/cancel pending operation" issues). Splitting our nice ipc_send_message_and_get_response() into two phases, with some state maintained in-between, sounds complex.

By the way, what is the struct shim_ipc_message* msg in your send_cancel callback? Why is this argument needed? And what is msg->seq, how does it help?

Jul 08 '21 07:07 dimakuv

Option 3 sounds like an overkill to fix this issue (and similar "amend/cancel pending operation" issues). Splitting our nice ipc_send_message_and_get_response() into two phases, with some state maintained in-between, sounds complex.

I agree; on the other hand, option 4 sounds over-specific. But since it seems cheap, maybe it doesn't hurt to try it out. I'll try finishing my branch with option 4 and we'll see how it looks.

By the way, what is the struct shim_ipc_message* msg in your send_cancel callback? Why is this argument needed? And what is msg->seq, how does it help?

It was supposed to be the original message (lock_msg). I need seq which is a message identifier (currently added by ipc_send_message_and_get_response), so that the "cancel" message says which operation to cancel. Actually, I'd probably like to use a generic void* arg parameter in the callback instead, to make it a bit more flexible; then lock_msg will be passed explicitly.

Jul 08 '21 09:07 pwmarcz

How does 4. work with both messages? Will we wait for responses for both? (original lock + cancel msg) Overall it seems like an ok solution for now.

Jul 11 '21 13:07 mkow

seq wasn't mean to be a public interface, it was intended to stay inside IPC implementation (i.e. not meant to be used as some other objects ID, which I guess 4 proposes - use seq as lock request ID). Using it outside and sending two messages with same seq may break some core assumptions.
ipc_send_message_and_get_response is not interruptible by design. It's meant to be an internal mechanism and all public IPC interfaces are meant to be building blocks for other subsystems. Response messages are supposed to come ~soon, i.e. they are not supposed to do sleeps/waits in the message recipient (destination IPC worker). Also doing sleeps with functions other than thread_wait is not signal-safe/fully-signal-aware.

I would say you cannot get away from using something like:

struct X x;
add_to_global_list(&x);
ipc_send_msg(get_lock);
thread_wait();
if (!check_if_condition_happend(&x, is_lock_taken, atomic_wrt_callbacks=True)) {
    ipc_send_msg(cancel_get_lock);
}
remove_from_global_list(&x);
return check_if_condition_happend(&x, is_lock_taken, atomic_wrt_callbacks=False);

and have some callbacks on lock_get response messages. This might sound a bit like reimplementing responses, but this seems to be a pretty special case. Also this sounds like something that might need to be built-in sync server/client code?

Jul 11 '21 23:07 boryspoplawski

OK, looks like no easy win here. On the other hand, reimplementing responses doesn't seem so bad: an IPC abstraction for "long-term operations" might be exactly what I need. I'll think about it more.

Jul 19 '21 11:07 pwmarcz

@pwmarcz It's quite possible that we need to add some more advanced IPC primitives than we have right now.

Jul 19 '21 11:07 boryspoplawski

gramine gramine copied to clipboard

RFC: interruptible POSIX locks

The problem

How to interrupt an operation over IPC?

gramine
gramine copied to clipboard