GOTCHA icon indicating copy to clipboard operation
GOTCHA copied to clipboard

Workaround for glibc v2.34+ not exporting _dl_sym

Open jrmadsen opened this issue 2 years ago • 12 comments

  • added test_dlopen2 test validating RTLD_DEFAULT and RTLD_NEXT usage
  • closes #100

@mplegendre I believe this relatively straight-forward solution addresses the issue -- am I missing something here? I separated it out to only be applied when glibc >= 2.34 but it works fine with older versions of glibc.

jrmadsen avatar Dec 26 '21 17:12 jrmadsen

Thanks! Simply doing this inside dlsym_wrapper seems to have the same effect:

  if (handle==RTLD_NEXT || handle==RTLD_DEFAULT) {
     return orig_dlsym(handle, symbol_name);
  }

However, it would have been pretty easy to do this in the first place, so I'm not sure what the rationale behind using the internal __dlsym call was. I guess only @mplegendre can tell.

daboehme avatar Dec 26 '21 23:12 daboehme

@daboehme i interpreted using the internal _dl_sym as wanting to make sure that one is truly calling dlsym from libdl, since gotcha can/will layer wrappees or dlsym might be overridden by an LD_PRELOAD. And this implementation effectively retains that behavior.

jrmadsen avatar Dec 28 '21 22:12 jrmadsen

@daboehme i interpreted using the internal _dl_sym as wanting to make sure that one is truly calling dlsym from libdl, since gotcha can/will layer wrappees or dlsym might be overridden by an LD_PRELOAD. And this implementation effectively retains that behavior.

I suspect that might be because there is no guarantee that the wrappee correctly handles RTLD_DEFAULT or RTLD_NEXT

jrmadsen avatar Dec 28 '21 22:12 jrmadsen

i interpreted using the internal _dl_sym as wanting to make sure that one is truly calling dlsym from libdl, since gotcha can/will layer wrappees or dlsym might be overridden by an LD_PRELOAD. And this implementation effectively retains that behavior.

Ok, if it's that then the dlopen approach should work. Your solution works for me BTW, but I had to use "libdl.so.2" instead of "libdl.so" as there was no "libdl.so" symlink on my test system (Ubuntu).

If this approach works and fixes the issue we might as well drop the _dl_sym version entirely and skip the #ifdefs. But I'd still like @mplegendre to chime in.

daboehme avatar Jan 04 '22 19:01 daboehme

We used _dl_sym() to correctly handle the case when the application passes RTLD_NEXT parameter to dlsym(). With RTLD_NEXT, dlsym() will inspect up the call stack for the DSO of the caller, then find the symbol in a DSO that follows the caller DSO. Putting gotcha's wrappers around dlsym breaks the "inspect the call stack" part.

_dl_sym() had an extra parameter that let us pass the address to use instead of pulling it from the call stack. We could pass in the address from the original DSO and call dlsym from anywhere.

Off the top of my head, we have three ways to go forward:

  1. Forget about all this and just break RTLD_NEXT. It's not a common parameter anyway. I think it's mostly meant for LD_PRELOAD wrapping tools. And we're trying to offer and alternative to those.
  2. Dynamically build a trampoline assembly structure that bounces control flow through the original DSO, to dlsym, and then back to the gotcha wrappers. This need not be too tough, but I'm worried because this is a common virus technique. If security tools catches gotcha doing this then they might flag gotcha as a virus. That would become a headache.
  3. Reimplement dlsym with RTLD_NEXT at the gotcha level. We could have gotcha manually search DSOs for the symbol starting from the caller DSO, then use the original dlsym without RTLD_NEXT to create the function pointer and do any ld.so initialization.

I currently favor approach 3. But I'm worried there could be some subtleties in the search order that I'm not anticipating.

I haven't run/tested this PR, but from looking at it I think it's taking the 1 approach.

mplegendre avatar Jan 04 '22 19:01 mplegendre

Ok, thanks for the explanation. That's more complicated than I thought! Depending on how long it takes to implement option 3, I think we should in the meantime at least fix the build even if it breaks RTLD_NEXT for now.

daboehme avatar Jan 04 '22 21:01 daboehme

I haven't run/tested this PR, but from looking at it I think it's taking the 1 approach.

@mplegendre I don't think it is breaking RTLD_NEXT since this test snippet passes:

    retfive = (int (*)(void)) dlsym(RTLD_NEXT, "return_five");
    if(retfive == NULL)
    {
        fprintf(stderr, "ERROR: dlsym(RTLD_NEXT, 'return_five') returned null pointer\n");
        had_error = -1;
    }
    else if(retfive() != 6)
    {
        fprintf(stderr,
                "ERROR: dlsym(RTLD_NEXT, 'return_five') returned the wrong symbol\n");
        had_error = -1;
    }

jrmadsen avatar Jan 04 '22 22:01 jrmadsen

I took the patch and tried it with our survey tool. It compiles but something seems to be wrong as we aren't getting our performance data gathered. So, more investigation is needed by us in our survey use case.

jgalarowicz avatar Jan 12 '22 15:01 jgalarowicz

I took the patch and tried it with our survey tool. It compiles but something seems to be wrong as we aren't getting our performance data gathered. So, more investigation is needed by us in our survey use case.

Does your survey tool use RTLD_DEFAULT or RTLD_NEXT? Because this patch should not affect any other functionality if y'all do not.

jrmadsen avatar Jan 14 '22 19:01 jrmadsen

Hi Jonathan,

The gotcha usage section of survey isn't my code, but we appear to use RTLD_NEXT (grep below).

I just ran the application that was aborting previously, with our latest survey changes.

It appears that the abort, from not gathering performance data, we were seeing is gone.

I believe the gotcha patch is working for survey.

gotcha compiles and I've run several tests now where I get the expected results.

The workaround/fix looks good from our side.

Thanks

Jim G

grep: build/scripts/CMakeFiles: Is a directory collector/io/wrappers.c:#define __USE_GNU /* XXX forRTLD_NEXT on Linux */ collector/io/wrappers.c: *(void **) (&f_read) = dlsym(RTLD_NEXT, "read"); collector/io/wrappers.c: //ssize_t (*realfunc)() = dlsym (RTLD_NEXT, "write"); collector/io/wrappers.c: *(void **) (&realfunc) = dlsym(RTLD_NEXT, "write"); collector/io/wrappers.c: *(void **) (&realfunc) = dlsym(RTLD_NEXT, "lseek"); collector/io/wrappers.c: *(void **) (&realfunc) = dlsym(RTLD_NEXT, "lseek64"); collector/io/wrappers.c: *(void **) (&realfunc) = dlsym(RTLD_NEXT, "open"); collector/io/wrappers.c: *(void **) (&realfunc) = dlsym(RTLD_NEXT, "open64"); collector/io/wrappers.c: *(void **) (&realfunc) = dlsym(RTLD_NEXT, "close"); collector/io/wrappers.c: *(void **) (&realfunc) = dlsym(RTLD_NEXT, "dup"); collector/io/wrappers.c: *(void **) (&realfunc) = dlsym(RTLD_NEXT, "dup2"); collector/io/wrappers.c: *(void **) (&realfunc) = dlsym(RTLD_NEXT, "creat"); collector/io/wrappers.c: *(void **) (&realfunc) = dlsym(RTLD_NEXT, "creat64"); collector/io/wrappers.c: *(void **) (&realfunc) = dlsym(RTLD_NEXT, "pipe"); collector/io/wrappers.c: ssize_t (*realfunc)() = dlsym (RTLD_NEXT, "pread"); collector/io/wrappers.c: ssize_t (*realfunc)() = dlsym (RTLD_NEXT, "pread64"); collector/io/wrappers.c: ssize_t (*realfunc)() = dlsym (RTLD_NEXT, "pwrite"); collector/io/wrappers.c: ssize_t (realfunc)() = dlsym (RTLD_NEXT, "pwrite64"); collector/io/wrappers.c: ssize_t (realfunc)() = dlsym (RTLD_NEXT, "readv"); collector/io/wrappers.c: ssize_t (realfunc)() = dlsym (RTLD_NEXT, "writev"); collector/mem/wrappers.c:#define __USE_GNU / XXX forRTLD_NEXT on Linux */ collector/mem/wrappers.c: f_malloc = dlsym(RTLD_NEXT, "malloc"); collector/mem/wrappers.c: f_realloc = dlsym (RTLD_NEXT, "realloc"); collector/mem/wrappers.c: f_posix_memalign = dlsym (RTLD_NEXT, "posix_memalign"); collector/mem/wrappers.c: f_memalign = dlsym (RTLD_NEXT, "memalign"); collector/mem/wrappers.c: f_free = dlsym (RTLD_NEXT, "free"); grep: install/spack/externals: Is a directory

On 1/14/22 13:04, Jonathan R. Madsen wrote:

I took the patch and tried it with our survey tool. It compiles
but something seems to be wrong as we aren't getting our
performance data gathered. So, more investigation is needed by us
in our survey use case.

Does your survey tool use |RTLD_DEFAULT| or |RTLD_NEXT|? Because this patch should not affect any other functionality if y'all do not.

— Reply to this email directly, view it on GitHub https://github.com/LLNL/GOTCHA/pull/101#issuecomment-1013381535, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEZ3NIIRRFW6TUTBCQFLSLUWBXTBANCNFSM5KZEFIKA. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: @.***>

jgalarowicz avatar Jan 14 '22 20:01 jgalarowicz

Just to be clear here.  The wrappers shown here do not use gotcha. In particular the mem/wrappers.c is completely replaced by a gotcha based version of mem wrappers which does not call dlsym at all and relies on gotcha to handle this kind of thing.  we create a dso that we ld_preload into an application if the user desires to collect some simple memory metrics and chooses to use the gotcha implementation.

The gotcha wrappers source is attached.

cheers, Don

On 1/14/22 2:44 PM, Jim Galarowicz wrote:

Hi Jonathan,

The gotcha usage section of survey isn't my code, but we appear to use RTLD_NEXT (grep below).

I just ran the application that was aborting previously, with our latest survey changes.

It appears that the abort, from not gathering performance data, we were seeing is gone.

I believe the gotcha patch is working for survey.

gotcha compiles and I've run several tests now where I get the expected results.

The workaround/fix looks good from our side.

Thanks

Jim G

grep: build/scripts/CMakeFiles: Is a directory collector/io/wrappers.c:#define __USE_GNU /* XXX forRTLD_NEXT on Linux */ collector/io/wrappers.c: *(void **) (&f_read) = dlsym(RTLD_NEXT, "read"); collector/io/wrappers.c: //ssize_t (*realfunc)() = dlsym (RTLD_NEXT, "write"); collector/io/wrappers.c: *(void **) (&realfunc) = dlsym(RTLD_NEXT, "write"); collector/io/wrappers.c: *(void **) (&realfunc) = dlsym(RTLD_NEXT, "lseek"); collector/io/wrappers.c: *(void **) (&realfunc) = dlsym(RTLD_NEXT, "lseek64"); collector/io/wrappers.c: *(void **) (&realfunc) = dlsym(RTLD_NEXT, "open"); collector/io/wrappers.c: *(void **) (&realfunc) = dlsym(RTLD_NEXT, "open64"); collector/io/wrappers.c: *(void **) (&realfunc) = dlsym(RTLD_NEXT, "close"); collector/io/wrappers.c: *(void **) (&realfunc) = dlsym(RTLD_NEXT, "dup"); collector/io/wrappers.c: *(void **) (&realfunc) = dlsym(RTLD_NEXT, "dup2"); collector/io/wrappers.c: *(void **) (&realfunc) = dlsym(RTLD_NEXT, "creat"); collector/io/wrappers.c: *(void **) (&realfunc) = dlsym(RTLD_NEXT, "creat64"); collector/io/wrappers.c: *(void **) (&realfunc) = dlsym(RTLD_NEXT, "pipe"); collector/io/wrappers.c: ssize_t (*realfunc)() = dlsym (RTLD_NEXT, "pread"); collector/io/wrappers.c: ssize_t (*realfunc)() = dlsym (RTLD_NEXT, "pread64"); collector/io/wrappers.c: ssize_t (*realfunc)() = dlsym (RTLD_NEXT, "pwrite"); collector/io/wrappers.c: ssize_t (realfunc)() = dlsym (RTLD_NEXT, "pwrite64"); collector/io/wrappers.c: ssize_t (realfunc)() = dlsym (RTLD_NEXT, "readv"); collector/io/wrappers.c: ssize_t (realfunc)() = dlsym (RTLD_NEXT, "writev"); collector/mem/wrappers.c:#define __USE_GNU / XXX forRTLD_NEXT on Linux */ collector/mem/wrappers.c: f_malloc = dlsym(RTLD_NEXT, "malloc"); collector/mem/wrappers.c: f_realloc = dlsym (RTLD_NEXT, "realloc"); collector/mem/wrappers.c: f_posix_memalign = dlsym (RTLD_NEXT, "posix_memalign"); collector/mem/wrappers.c: f_memalign = dlsym (RTLD_NEXT, "memalign"); collector/mem/wrappers.c: f_free = dlsym (RTLD_NEXT, "free"); grep: install/spack/externals: Is a directory

On 1/14/22 13:04, Jonathan R. Madsen wrote:

I took the patch and tried it with our survey tool. It compiles
but something seems to be wrong as we aren't getting our
performance data gathered. So, more investigation is needed by us
in our survey use case.

Does your survey tool use |RTLD_DEFAULT| or |RTLD_NEXT|? Because this patch should not affect any other functionality if y'all do not.

— Reply to this email directly, view it on GitHub https://github.com/LLNL/GOTCHA/pull/101#issuecomment-1013381535, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEZ3NIIRRFW6TUTBCQFLSLUWBXTBANCNFSM5KZEFIKA. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: @.***>

jgalarowicz avatar Jan 14 '22 20:01 jgalarowicz

Turns out using _dl_sym is more problematic than it being a hidden symbol in recent glibc versions. It turns out in some situations, when gotcha calls it with RTLD_NEXT and the next symbol doesn't exist, it will crash the application instead of just returning a null pointer.

jrmadsen avatar Apr 29 '22 19:04 jrmadsen