clay icon indicating copy to clipboard operation
clay copied to clipboard

Improve Function perfomance

Open galchinsky opened this issue 11 years ago • 5 comments

It is a feature-request or an RFC. Current C++ implementations of std::function usually hold small closures in the functor itself to avoid heap allocation. As I can see, Clay's Function always uses a heap and this could be changed.

galchinsky avatar Mar 08 '13 19:03 galchinsky

Modern allocators (like tcmalloc) are very efficient in allocating small objects, and Clay should probably have core library easy to maintain rather than highly optimized.

stepancheg avatar Mar 09 '13 01:03 stepancheg

The representation of Function could be improved and retain a high-level, easy-to-maintain structure by being a variant Function (FunctionWithSmallClosure, FunctionWithHeapClosure).

jckarter avatar Mar 09 '13 17:03 jckarter

There's simpler solution:

record SmallInPlace (
    data: Array[Byte, TypeSize(RawPointer)]],
);

record LargeInHeap (
   data: RawPointer,
);

// not generic
variant MemoryHolderSmallInPlaceLargeInHeap (SmallInPlace, LargeInHeap);

allocateMemorySmallInPlaceLargeInHeap(T): MemoryHolderSmallInPlaceLargeInHeap =
    if (TypeSize(T) <= TypeSize(SmallInPlace().data))
        SmallInPlace()
    else
        LargeInHeap(allocateRawMemory(TypeSize(T)));

and then

record Function[In, Out] (
    obj: MemoryHolderSmallInPlaceLargeInHeap,
    ...
);

Function has only only implementation.

Note that this implementation (as well as variant Function implementation) won't be always faster than current implementation, because malloc is cheap, but branch misprediction isn't.

stepancheg avatar Mar 10 '13 14:03 stepancheg

Indeed, factoring out the memory holder is a good idea to avoid needless instantiation. I would guess though that, even if you have a fast malloc, locality and heap size efficiency would end up being bigger factors than branch misprediction in a larger application. That's why libc++ favors size over speed and large C++ projects like LLVM and WebKit make heavy use of custom in-place SmallVector/SmallString/SmallDenseMap/etc. containers. In the case of Function, there's an indirect call to an underlying function pointer anyway, which will probably be opaque to the branch predictor no matter what.

jckarter avatar Mar 10 '13 16:03 jckarter

Usually a construction is more rare than using. That's why cache friendliness of an object is often better than some creation overhead.

galchinsky avatar Mar 10 '13 18:03 galchinsky