fish-shell icon indicating copy to clipboard operation
fish-shell copied to clipboard

Get OpenBSD executable path

Open time-killer-games opened this issue 3 years ago • 26 comments

The code in master doesn't allow for this, in OpenBSD to get the executable path, and mounting the procfs is no longer possible.

Feel free to add me under people assigned.

I just want to know in advance if this code will be accepted or if it's too hacky to be taken seriously: https://github.com/time-killer-games/libfilesystem/blob/2f773e866f99f7ea2228f970dbf43ed3128df821/filesystem.cpp#L541-L587

Of course, I will rewrite this code to be C compliant once the pr will be opened. OpenBSD has no official API for getting the current exe path, and I don't know if it ever did, it very well may have when the procfs was available, but i wasn't an OpenBSD user when that was still available to the OS. This is pretty much the best we can do without an official API.

time-killer-games avatar Jul 22 '22 11:07 time-killer-games

Here's what the solution would look like:

https://github.com/time-killer-games/getexecname/blob/patch-1/getexecname.c

Note this is a custom fork of someone else's code which is a C solution I modified to support getting the $PWD for specific use cases.

time-killer-games avatar Jul 23 '22 21:07 time-killer-games

You'll want to look through our codebase, because we do of course have helpers to find an executable in $PATH, or to add $PWD to a path.

That should make this quite a bit simpler.

faho avatar Jul 24 '22 07:07 faho

Tbh I hate the complexities of that code, the hoops we'd have to jump through.

Because it relies on $PATH and the working directory, you need to jump through hoops to get the original values of those. But even that isn't actually guaranteed to be correct!

Because what you need is the $PATH and cwd of fish's parent process, at the time it started fish. And that is iff it decided to not play games with argv[0] - like call us as -fish.

So I'm tempted to just declare that we'll be printing "fish" on OpenBSD until they grow a feature that allows us to actually be correct.

faho avatar Jul 24 '22 10:07 faho

if the executable path fails to be retrieved for one reason or another, couldn't we check for nullptr or an empty string and in such a case use hard-coded "fish" as a fallback? If not, this ticket can be closed.

The code here: https://github.com/time-killer-games/getexecname/blob/patch-1/getexecname.c

To get the parent process values, it would be as simple as plugging in the getppid() instead of getpid() to sysctl() but like you say that may still have issues regardless

Another option instead of using argv[0] would be to use getprogname() although that would break with a relative path to fish in some cases. Not many people would use a relative path to fish, most people would would run based on the filename spec without the directory spec. Thus, the using $PATH + getprogname might be good enough.

time-killer-games avatar Jul 24 '22 14:07 time-killer-games

Tbh I hate the complexities of that code, the hoops we'd have to jump through.

Because it relies on $PATH and the working directory, you need to jump through hoops to get the original values of those. But even that isn't actually guaranteed to be correct!

Because what you need is the $PATH and cwd of fish's parent process, at the time it started fish. And that is iff it decided to not play games with argv[0] - like call us as -fish.

So I'm tempted to just declare that we'll be printing "fish" on OpenBSD until they grow a feature that allows us to actually be correct.

I've come up with a means to get a proper error code if the executable path returns a path to the wrong file. Basically if meams getting the vnode from the current process id, get the stat struct from the vnode, the compare a few stat structure members with the stat structure returned by opening the path the function guessed points to the current executable. It involves working with a lot of kernel functions which do have documentation but for one reason or another I can't get any of it to compile because linking errors, and the docs don't say anything about what libraries I should be linking to. It will take a while, but this is a lot better.

time-killer-games avatar Jul 25 '22 18:07 time-killer-games

I'm surprised the OpenBSD kernel developers do not provide a way to do this, considering their security stance and repute.

floam avatar Jul 25 '22 18:07 floam

I read this a bit: https://github.com/ziglang/zig/issues/6718

Honestly IMO if we need to do any more work than take advantage of the platform-specific API to get the executable path, I don't think we should go so far as trying to derive it by heroic means.

I admittedly don't fully understand where the problem is.

floam avatar Jul 25 '22 18:07 floam

Another option instead of using argv[0] would be to use getprogname() although that would break with a relative path to fish in some cases. Not many people would use a relative path to fish, most people would would run based on the filename spec without the directory spec. Thus, the using $PATH + getprogname might be good enough.

It would not. In fact that would barely be better than what we have now.

the compare a few stat structure members with the stat structure returned by opening the path the function guessed points to the current executable

Please don't. That just adds more guesswork and complication, for unclear gain.


Here's what we can do:

  1. We can resolve the executable path once, at startup
  2. If we do that at startup, we can check for it in $PWD and/or $PATH at that time - just do it before we read our config and it's fine
  3. If it doesn't resolve, return "fish" so people using status fish-path could conceivably find fish via $PATH at that time
  4. (optional) As one special case, handle -fish like fish.

faho avatar Jul 25 '22 18:07 faho

I'm like a few days late on this, but I'm happy with how the NetBSD one went. It just needed different sysctl parameters to get the executable path for the current process. My understanding was there was some discussion of trying to do a normalization step to remove redundancies like symlinks, or '..' appearing in the middle - and this was (IMO) correctly abandoned because we still got an actual path to the executable. What is the specific issue on OpenBSD?

Sorry for asking this all dumbed up but I tried to check the commits at one point a couple days ago on one of these and the branch was deleted, and I'm seeing these links to other projects - please sum it up.

floam avatar Jul 25 '22 19:07 floam

To expand on my previous comment, here's some general notes from my research yesterday.

I know you don't approve of this idea, I'm just putting it here for anyone who might stumble across this issue on Google who might find it useful.

To get the vnode from a process id, you must get the struct process * from the process id using the function struct process *prfind(pid_t) declared in sys/proc.h. Then from that struct process * there is the member of the struct called struct vnode *ps_textvp which according to the code comment and other research is in fact the struct vnode * of the process's executable file. The struct process * structure is defined in sys/proc.h. The _KERNEL and __need_process macros need to be defined before including the sys/proc.h header otherwise the struct process * will not be defined. Defining _KERNEL before including sys/proc.h must be done, however it breaks my code if I define that macro before including sys/vnode.h, which is required for the int VOP_GETATTR(struct vnode *, struct vattr *, struct ucred *, struct proc *) function among other things we need, like the struct vattr * structure, which holds the information necessary to get a proper struct stat * from the struct vnode *. To actually get the struct stat * from the struct vnode *, we call int vn_stat(struct vnode *, struct stat *, struct proc *) which also relies on another structure we need to magically get from the current process id, struct proc *. I'm not sure if that parameter can be null or not, or if that would cause issues. MAXCPUS is a macro which needs to be defined, and I'm not sure what header is the right one to include for it, but it depends on the architecture when done incorrectly because each architecture of binary has its own definition of its macro in a different folder in the source tree named after the architecture. For example, if you are building for amd64, it will be in a folder named amd64, etc. Despute having all the right includes and defining that MAXCPUS macro on my own as 64, it still gives me undefined references when building. According to the multiple sources I've found, comparing the ino_t, dev_t, (and possible filesize, modified timestamp) members of struct stat is good enough to know it is the correct file on the file system, the only issue one would run into is hard links of the executable, which have wrong locations, though it would be very unlikely to return them from the current argv[0].

time-killer-games avatar Jul 25 '22 19:07 time-killer-games

I'm like a few days late on this, but I'm happy with how the NetBSD one went. It just needed different sysctl parameters to get the executable path for the current process. My understanding was there was some discussion of trying to do a normalization step to remove redundancies like symlinks, or '..' appearing in the middle - and this was (IMO) correctly abandoned because we still got an actual path to the executable. What is the specific issue on OpenBSD?

Sorry for asking this all dumbed up but I tried to check the commits at one point a couple days ago on one of these and the branch was deleted, and I'm seeing these links to other projects - please sum it up.

The problem is on OpenBSD we rely on $PWD, $PATH, and the argv[0] of the executable to construct the path. All of this information may be modified by the developer, as well as even the user running the executable from their shell instance, which could lead to undesirable and incorrect results. OpenBSD has no kernel interface for this.

time-killer-games avatar Jul 25 '22 19:07 time-killer-games

I'm probably going to need to install OpenBSD again to figure this out then, because I just don't know what you mean. How do PWD, PATH, argv[0] relate to what the kernel can tell us?

floam avatar Jul 25 '22 19:07 floam

We can get perhaps the parent directory or something?

floam avatar Jul 25 '22 19:07 floam

To get the vnode from a process id, you must get the struct process * from the process id using the function struct process *prfind(pid_t) declared in sys/proc.h. Then from that struct process * there is the member of the struct called struct vnode *ps_textvp which according to the code comment and other research is in fact the struct vnode * of the process's executable file. The struct process * structure is defined in sys/proc.h

This is much too complicated to be usable. I recommend abandoning that approach, I would not want to merge it, and I would not be confident that it works.

Please see what I recommended above. Just resolve the path on startup, it's fine.

For anything more, pester the OpenBSD people to actually include something. Without kernel assistance this isn't going to get better.

faho avatar Jul 25 '22 19:07 faho

I think this is too difficult. https://marc.info/?l=openbsd-misc&m=144987773230417&w=2

floam avatar Jul 25 '22 19:07 floam

To get the vnode from a process id, you must get the struct process * from the process id using the function struct process *prfind(pid_t) declared in sys/proc.h. Then from that struct process * there is the member of the struct called struct vnode *ps_textvp which according to the code comment and other research is in fact the struct vnode * of the process's executable file. The struct process * structure is defined in sys/proc.h

This is much too complicated to be usable. I recommend abandoning that approach, I would not want to merge it, and I would not be confident that it works.

Please see what I recommended above. Just resolve the path on startup, it's fine.

For anything more, pester the OpenBSD people to actually include something. Without kernel assistance this isn't going to get better.

I'm fine with doing it how you describe, but I first need to know if this is/isn't going to happen. Is everyone on the same page as @floam that this should not be added? I think we should let all major contributors have at least some say here, and I'm not one of them.

time-killer-games avatar Jul 25 '22 19:07 time-killer-games

if I missed something feel free to reopen. This isn't some kind of veto

floam avatar Jul 25 '22 19:07 floam

I understand. It's hard to explain why we need those environment variables and argv[0]. The best i can offer is recommend you look at zig's implementation in the link you shared as well as the other links shared on this issue which give similar ways of doing it. TL;DR there is no way to do this officially in OpenBSD so we are using crafty workarounds in attempt to achieve the most unlikely result to break.

time-killer-games avatar Jul 25 '22 19:07 time-killer-games

I'm reopening this.

What we can do is as I described: We can resolve the path on startup. We'll also do this for the other ways of getting the path (the other "code paths"), and simply cache the executable path, so we don't need to do it every time we do status fish-path. Because we use the path every time (to figure out if we're relocated), this isn't some speculation either.

So we would be improving our code quality by doing that.

And the solution of trying to resolve via $PATH would be good enough for 99.87% of cases. If the calling program really fiddles with our argv[0], well that's tough - but the calling program could alread call fish, so it's not like there's a security boundary or anything. If it had given us arguments of -c "rm -rf ~/*" we'd have erased $HOME.

It would be worse to do it as it is now, because now the path is resolved when status fish-path is called, after all $PATH setup and cd and such is done! But if you simply do it early enough, that's fine, you can just use the original directory and $PATH.

What I'm not sure about is using $PWD - if our argv[0] is fish, then were we really started as ./fish? Probably not? And do we want to say it's a file called "fish" in the current directory just randomly if we could also be /usr/bin/fish (called via $PATH)? Probably not.

So it's more like

  • If path is absolute (i.e. starts with /), use it
  • If path isn't absolute and contains a /, look in $PWD
  • If path isn't absolute and doesn't contain a /, look in $PATH

which is basically what path_get_path does.

faho avatar Jul 25 '22 19:07 faho

Ok so quick question...

maybe_t<wcstring> path_get_path(const wcstring &cmd, const environment_t &vars) {
    auto result = path_try_get_path(cmd, vars);
    if (result.err != 0) {
        return none();
    }
    wcstring path = std::move(result.path);
    return path;
}

I read the docs for none() and it only returns true of all bits are zero.

so to use that function I would assume I do:

if (path_get_path(wargv[0], nullptr) != none()) {
   // not sure what to do with maybe_t<wcstring> i would know what to do if it was just a wcstring though
}

time-killer-games avatar Jul 26 '22 14:07 time-killer-games

I read the docs for none() and it only returns true of all bits are zero.

"none()" is part of our maybe_t implementation - basically a backport of std::optional.

In essence you'd use it like

auto path = path_get_path(wargv[0], nullptr);
if (path) {
    wcstring actual_path = *path;
}

This is because maybe_t overloads operator bool(), so you can use it like it's a null pointer - with the exception that maybe_t<wcstring> doesn't work in place of a wcstring like a NULL would in place of a wchar_t*.

faho avatar Jul 26 '22 15:07 faho

Thanks, good to know.

time-killer-games avatar Jul 26 '22 15:07 time-killer-games

After further research, and major help from the OpenBSD developer mailing list, I've found code that can verify the string we have on whether it points to the correct executable or one of its hard links.

https://github.com/time-killer-games/xproc/commit/03a643a7d33e679994f4896acfca7a9fa7d5467f

Ignore the free_cmdline() function. It just calls delete[] on my argv buffer. This requires including <kvm.h> and linking with -lkvm. proc_id can be replaced with getpid(). *buffer can be replaced with our best guess on the executable path once its finally resolved. Let me know what you all think.

time-killer-games avatar Jul 26 '22 20:07 time-killer-games

In terms of complexity that seems acceptable. If it works, we can use it.

faho avatar Jul 27 '22 05:07 faho

So it's more like

* If path is absolute (i.e. starts with `/`), use it

* If path isn't absolute and contains a `/`, look in $PWD

* If path isn't absolute and doesn't contain a `/`, look in $PATH

which is basically what path_get_path does.

I'm assuming all 3 of those bullet points are handled by path_get_path()? Here's the current state of my code (along with existing code in master for context of where I'm putting it in-file, common.cpp):

/// Return the path to the current executable. This needs to be realpath'd.
std::string get_executable_path(const char *argv0) {
#ifdef __APPLE__
    // Note: _NSGetExecutablePath() is proprietary and might return a symlink.
    // Use the open source Darwin / XNU kernel proc_pidpath() function instead.
    // This is basically grabbing exec_path after argc, argv, envp, ...: for us.
    // https://opensource.apple.com/source/adv_cmds/adv_cmds-163/ps/print.c
    // The buffer must have the exact size of macro PROC_PIDPATHINFO_MAXSIZE.
    char buff[PROC_PIDPATHINFO_MAXSIZE]; uint32_t buff_size = sizeof buff;
    if (proc_pidpath(getpid(), buff, buff_size) > 0) return std::string(buff);
#else
    char buff[PATH_MAX];
    size_t buff_size = sizeof buff;
#if defined(__BSD__) && defined(KERN_PROC_PATHNAME)
    // BSDs do not have /proc by default, (although it can be mounted as procfs via the Linux
    // compatibility layer). We can use sysctl instead: per sysctl(3), passing in a process ID of -1
    // returns the value for the current process.
#if defined(__NetBSD__)
    int name[] = {CTL_KERN, KERN_PROC_ARGS, -1, KERN_PROC_PATHNAME};
#else
    int name[] = {CTL_KERN, KERN_PROC, KERN_PROC_PATHNAME, -1};
#endif
    int result = sysctl(name, sizeof(name) / sizeof(int), buff, &buff_size, nullptr, 0);
    if (result != 0) {
        wperror(L"sysctl KERN_PROC_PATHNAME");
    } else {
        return std::string(buff);
    }
#elif defined(__OpenBSD__)
    static kvm_t *kd = nullptr;
    std::string narrow; bool ok = false;
    wcstring wargv0 = str2wcstring(argv0);
    // not sure if I am doing this right??
    null_environment_t vars = null_environment_t();
    auto path = path_get_path(wargv0, vars);
    if (path) {
        char errbuf[_POSIX2_LINE_MAX];
        kinfo_file *kif = nullptr; int cntp = 0;
        narrow = wcs2string(*path); struct stat st; 
        kd = kvm_openfiles(nullptr, nullptr, nullptr, KVM_NO_FILES, errbuf); 
        if (kd) {
            if ((kif = kvm_getfiles(kd, KERN_FILE_BYPID, getpid(), sizeof(struct kinfo_file), &cntp))) {
                for (int i = 0; i < cntp; i++) {
                    if (kif[i].fd_fd == KERN_FILE_TEXT) {
                        if (!stat(narrow.c_str(), &st)) {
                            if (st.st_dev == (dev_t)kif[i].va_fsid || st.st_ino == (ino_t)kif[i].va_fileid) {
                                ok = true;
                                break;
                            }
                        }
                    }
                }
            }
            kvm_close(kd);
        }
    }
    narrow[buff_size] = '\0';
    strcpy(buff, narrow.substr(0, buff_size).c_str());
    return ((ok) ? std::string(buff)  : "fish");
#else
    // On other unixes, fall back to the Linux-ish /proc/ directory
    ssize_t len;
    len = readlink("/proc/self/exe", buff, sizeof buff - 1);  // Linux
    if (len == -1) {
        len = readlink("/proc/curproc/file", buff, sizeof buff - 1);  // other BSDs
        if (len == -1) {
            len = readlink("/proc/self/path/a.out", buff, sizeof buff - 1);  // Solaris
        }
    }
    if (len > 0) {
        buff[len] = '\0';
        // When /proc/self/exe points to a file that was deleted (or overwritten on update!)
        // then linux adds a " (deleted)" suffix.
        // If that's not a valid path, let's remove that awkward suffix.
        std::string buffstr{buff};
        if (access(buff, F_OK)) {
            auto dellen = const_strlen(" (deleted)");
            if (buffstr.size() > dellen &&
                buffstr.compare(buffstr.size() - dellen, dellen, " (deleted)") == 0) {
                buffstr = buffstr.substr(0, buffstr.size() - dellen);
            }
        }
        return buffstr;
    }
#endif
#endif

The OS X changes will not be conflated with my OpenBSD pull request. It will be handled separately if approved.

I don't really work with CMake or makefiles that often. Where do you want me to link with -lkvm?

time-killer-games avatar Jul 27 '22 18:07 time-killer-games

I guess my last question would be what do i put for the path_get_path environment_t argument? Since it doesn't take a nullptr.

time-killer-games avatar Jul 27 '22 20:07 time-killer-games