vector icon indicating copy to clipboard operation
vector copied to clipboard

Choosing between performance and safety is inconvenient

Open dolio opened this issue 9 years ago • 4 comments

This issue is for discussing ideas related to the problem I'm going to outline below. I've discussed it with @ekmett before, but never really wrote it down anywhere.

The issue is this: to get performance, you don't want all your operations to be bounds checked. Further, your algorithm is probably designed to not index outside of the array, and if it does, it's a bug. But, occasionally, you will have bugs, and if you are using unsafe operations, what you'll get is a segfault, which is awful.

vector has kind of a solution to this issue, but it's not very good. You can write your algorithm with all unsafe operations, and if you start getting segfaults, you can use the -fUnsafeChecks and -fInternalChecks flags to turn on extra bounds checking on those unsafe operations. Then you will get nice errors when your bug is triggered. The problem is that these are build-time options for vector, so you must recompile your dependencies to debug.

By contrast, a C++ library (I've heard) might allow you to dynamically link to a debug version of a library that will perform bounds checks. This probably doesn't require recompilation of anything (as long as both the normal and debug libraries were already built).

The question is: how can we achieve something more convenient in Haskell? One approach is to have normal and Unsafe modules that export exactly the same API, so that they can be swapped out easily (this probably requires CPP to automate, though). This could be taken even further by splitting into separate packages that implement the same modules with the same API (not sure this is a good idea).

Another thought I've had is a module full of rewrite rules from foo to unsafeFoo. Importing it would eliminate all bounds checks.

The big problem with all of these is that they don't work with indirection. If I write code using vector, I can swap out for an unsafe module. If I write code that uses code that uses vector, I can't force them to swap between safe and unsafe operations. Every library has to do the same work of providing safe and unsafe analogues, when what we really want is to say, "link everyone to the bounds checked version."

dolio avatar Jul 24 '16 21:07 dolio

i think this does articulate a capability that maybe needs to be addressed at the ghc level? but seems like valid idea

cartazio avatar Jul 24 '16 21:07 cartazio

@dolio mentioned me to look at this ticket to comment whether Backpack can help. But the first question to ask is whether or not what you are asking is even possible. With cross-module inlining, it's not. C/C++ has the same problem, where if you change a header file, you need to recompile.

So, in principle, if you turned off inlining, you could get ABI compatible versions. But since this is all about performance to begin with, this is not going to do you much good.

Assuming you will recompile, it's easy enough to use Backpack to setup a generic vector interface, and have two implementations of the interface, one checked and one unchecked. It would be a modest UI improvement over a difficult to toggle flag.

ezyang avatar Jul 24 '16 21:07 ezyang

hrmm, devils advocate: could cabal project files make it easy to do that toggle in a client project?

cartazio avatar Jul 24 '16 21:07 cartazio

Talking with @ezyang, cabal's new build stuff can apparently address this pretty well (it won't go as far as relinking without recompiling, but I don't think that's the biggest deal, and it's the hardest problem to solve).

A cabal.project is able to manage flags in dependencies, and new-build will cache all builds. So you will only ever need to build vector once for each set of flags, and each package that depends on vector also only needs to be built once for each set of vector flags.

So, cabal new-build has made vector's previously not-very-good solution to this problem much more palatable.

dolio avatar Jul 24 '16 22:07 dolio