Parla.py icon indicating copy to clipboard operation
Parla.py copied to clipboard

VECs don't work in interactive Python (REPL)

Open insertinterestingnamehere opened this issue 5 years ago • 14 comments

The Python interpreter segfaults at startup when the modified glibc and forarding libraries for VECs are used and the interpreter is run in interactive mode. Running scripts runs fine. It's currently unclear what's causing this.

After mulling this one over for a while it occurred to me that there's a small chance that this could be an ABI incompatibility issue since we're relying on compatibility across a wide span of glibc versions. We should test with an unpatched build of the glibc version our patches apply to before pursuing other options. IIRC, the conda packages are built against the glibc version in CentOS 6 right now. Even though the backcompat guarantees of glibc are generally reliable, it's possible that something minor slipped through since that's roughly a ten year span of software versions. This would be an issue for any conda package, but we might be the first to see it since linux distros don't usually keep up with the latest glibc version and users don't usually keep up with the latest linux distro. It's also possible that this is an issue with using a glibc other than the default one.

Main point: we should test with an unpatched glibc to make sure that this even has anything to do with our code.

Yah, if this is broken with an unpatched glibc, I propose we clear it from the milestone.

VECs definitely require a patched glibc, so I'm not sure what you are getting at here.

arthurp avatar Apr 12 '21 15:04 arthurp

I'm saying that there's some small chance the repl segfault could be because of a glibc abi compatibility issue. The glibc used for conda packages is ancient. We can test that it's not anything to do with our patches by testing the repl with an unpatched but still custom compiled glibc.

Though I will point out that nothing forces us to use conda. We don't have all that many dependencies. We could easily use recent Ubuntu packages or something and maybe REPL would work in that case. That said, I don't think getting VECs working in the REPL is very important, so I'm fine with taking this off the mile store.

arthurp avatar Apr 12 '21 17:04 arthurp

Yah, REPL doesn't really matter much for VECs. OTOH, I'll probably prepare an unpatched glibc build anyway to use as a sanity check to verify if something's even our problem.

I prefer to use conda if possible because it ties in nicely with our ABI compatibility story, but if we end up having to restrict the glibc version packages are built with then oh well.

I'm not seeing this behavior on the servers for Keshav's group now that they've been updated to Ubuntu 18.04.

That's weird. I had it and I was running something like 19.04 (though I don't remember exactly the version).

arthurp avatar Apr 20 '21 21:04 arthurp

Weird. I did have to turn off the --enable-systemtap flag to get it to build on these particular machines, but pretty much nothing has changed since last time.

It's also possible I'm not exercising enough of the features to cause the segfault. I remember it being pretty immediate, but there's not a documented reproducer script.

Okay, I can reproduce this now. I'm not 100% sure if it's because I recompiled something or if I just wasn't testing this case for some reason, but now running

from parla.mutiload import multiload
with multiload():
    import numpy

causes the segfault. The last thing it shows in the logs is "Opening library to stub: NULL" from https://github.com/ut-parla/Parla.py/blob/master/runtime_libs/stub_library.h#L27 which seems like it could indicate something wrong.

Weirdly enough, it's now segfaulting at interpreter startup for me. Once again, I'm not sure what's changed. I haven't rebuilt glibc.

Sorry. I think I installed a malevolent presence in the VEC code. Hard to remove now. Shrug. ;-)

arthurp avatar Apr 28 '21 21:04 arthurp

We're doing enough crazy stuff in the VEC code to have weird stuff happen, but somehow it continues to surprise me.