libseccomp icon indicating copy to clipboard operation
libseccomp copied to clipboard

Odd behaviour with socketcall multiplexer handling

Open alip opened this issue 2 months ago • 0 comments

Hello kind people,

I am the main author of syd which thankfully uses libseccomp to provide a portable sandbox. In my testing I have noticed a few oddities with architectures which both have the socketcall(2) system call and newer non-multiplexed versions of the system calls as well. One example is ppc64:

$ syd-sys -a ppc64 socketcall
socketcall      102
$ syd-sys -a ppc64 send
send    334
sendto  335
sendmsg 341
sendmmsg        349
...

Now assume we want to install a portable filter that denies the MSG_OOB flag for the send(2) and recv(2) families. See the section Denying MSG_OOB Flag in send/recv System Calls on why this is relevant for a security boundary. For socketcall(2) we have no option but to divert the handling to userspace with the notify action and that's completely fine. However given you install a filter like this (excuse my rust but the idea should be fairly obvious):

            if restrict_oob {
                let oob = libc::MSG_OOB as u64;
                for (idx, sysname) in [
                    "recvmsg", "sendmsg", "send", "sendto", "sendmmsg", "recv", "recvfrom",
                    "recvmmsg",
                ]
                .iter()
                .enumerate()
                {
                    // MsgFlags is arg==2 for {recv,send}msg, and
                    //             arg==3 for send/recv, sendto/recvfrom, and sendmmsg/recvmmsg.
                    let sys = if let Ok(sys) = ScmpSyscall::from_name(sysname) {
                        sys
                    } else {
                        continue;
                    };
                    let idx = if idx <= 1 { 2 } else { 3 };
                    let err = ScmpAction::Errno(libc::EOPNOTSUPP);
                    let cmp = ScmpArgCompare::new(idx, ScmpCompareOp::MaskedEqual(oob), oob);
                    ctx.add_rule_conditional(err, sys, &[cmp])?;
                }
            }

One would expect, the non-multiplexed version of the send(2) family would be included in the filter, but it is not with the latest libseccomp and our MSG_OOB tests fails on such architectures (ppc64, x86, ...) because of this.

I have also encountered a similar problem where it is not directly possible to add notify actions to the non-multiplexed versions of the socket systemcalls. That, however, was possible to workaround:

    /// Insert a system call handler.
    #[expect(clippy::cognitive_complexity)]
    #[expect(clippy::disallowed_methods)]
    fn insert_handler(
        handlers: &mut HandlerMap,
        syscall_name: &'static str,
        handler: impl Fn(UNotifyEventRequest) -> ScmpNotifResp + Clone + Send + Sync + 'static,
    ) {
        for arch in SCMP_ARCH {
            if let Ok(sys) = ScmpSyscall::from_name_by_arch(syscall_name, *arch) {
                #[expect(clippy::disallowed_methods)]
                handlers
                    .insert(
                        Sydcall(sys, scmp_arch_raw(*arch)),
                        Arc::new(Box::new(handler.clone())),
                    )
                    .unwrap();
            } else {
                info!("ctx": "confine", "op": "hook_syscall",
                    "msg": format!("invalid or unsupported syscall {syscall_name}"));
            }

            // Support the new non-multiplexed ipc syscalls.
            if IPC_ARCH.contains(arch) {
                let sys_ipc = match syscall_name {
                    "shmat" => Some(397),
                    "msgctl" => Some(402),
                    "semctl" => Some(394),
                    "shmctl" => Some(396),
                    "msgget" => Some(399),
                    "semget" => Some(393),
                    "shmget" => Some(395),
                    _ => None,
                };

                if let Some(sys) = sys_ipc {
                    #[expect(clippy::disallowed_methods)]
                    handlers
                        .insert(
                            Sydcall(ScmpSyscall::from(sys), scmp_arch_raw(*arch)),
                            Arc::new(Box::new(handler.clone())),
                        )
                        .unwrap();
                    continue;
                }
            }

            // Support the new non-multiplexed network syscalls on MIPS, PPC, S390 & X86.
            let sys = match *arch {
                ScmpArch::M68k => match syscall_name {
                    "socket" => 356,
                    "bind" => 358,
                    // no accept on m68k.
                    "accept4" => 361,
                    "connect" => 359,
                    "getpeername" => 365,
                    "getsockname" => 364,
                    "getsockopt" => 362,
                    "recvfrom" => 368,
                    "sendto" => 366,
                    "sendmsg" => 367,
                    "sendmmsg" => 372,
                    _ => continue,
                },
                ScmpArch::Mips | ScmpArch::Mipsel => match syscall_name {
                    "socket" => 183,
                    "bind" => 169,
                    "accept" => 168,
                    "accept4" => 334,
                    "connect" => 170,
                    "getpeername" => 171,
                    "getsockname" => 172,
                    "getsockopt" => 173,
                    "recvfrom" => 176,
                    "sendto" => 180,
                    "sendmsg" => 179,
                    "sendmmsg" => 343,
                    _ => continue,
                },
                ScmpArch::Ppc | ScmpArch::Ppc64 | ScmpArch::Ppc64Le => match syscall_name {
                    "socket" => 326,
                    "bind" => 327,
                    "accept" => 330,
                    "accept4" => 344,
                    "connect" => 328,
                    "getpeername" => 332,
                    "getsockname" => 331,
                    "getsockopt" => 340,
                    "recvfrom" => 337,
                    "sendto" => 335,
                    "sendmsg" => 341,
                    "sendmmsg" => 349,
                    _ => continue,
                },
                ScmpArch::S390X | ScmpArch::S390 => match syscall_name {
                    "socket" => 359,
                    "bind" => 361,
                    // no accept on s390x.
                    "accept4" => 364,
                    "connect" => 362,
                    "getpeername" => 368,
                    "getsockname" => 367,
                    "getsockopt" => 365,
                    "recvfrom" => 371,
                    "sendto" => 369,
                    "sendmsg" => 370,
                    "sendmmsg" => 358,
                    _ => continue,
                },
                ScmpArch::X86 => match syscall_name {
                    "socket" => 359,
                    "bind" => 361,
                    // no accept on x86.
                    "accept4" => 364,
                    "connect" => 362,
                    "getpeername" => 368,
                    "getsockname" => 367,
                    "getsockopt" => 365,
                    "recvfrom" => 371,
                    "sendto" => 369,
                    "sendmsg" => 370,
                    "sendmmsg" => 345,
                    _ => continue,
                },
                _ => continue,
            };

            handlers
                .insert(
                    Sydcall(ScmpSyscall::from(sys), scmp_arch_raw(*arch)),
                    Arc::new(Box::new(handler.clone())),
                )
                .unwrap();

            #[expect(clippy::arithmetic_side_effects)]
            if matches!(*arch, ScmpArch::Mips | ScmpArch::Mipsel) {
                // This is a libseccomp oddity,
                // it could be a bug in the syscall multiplexer.
                // TODO: Investigate and submit a bug report.
                handlers
                    .insert(
                        Sydcall(ScmpSyscall::from(sys + 4000), scmp_arch_raw(*arch)),
                        Arc::new(Box::new(handler.clone())),
                    )
                    .unwrap();
            }
        }
    }

Admittedly, it's a bit annoying to hardcode all these but it works.

I do not know whether this oddity is a bug but I would expect the socketcall(2) and ipc(2) multiplexing handling in libseccomp to handle these behind me. Is this possible? Thank you in advance.

alip avatar Oct 26 '25 12:10 alip