ManualMemory.jl
ManualMemory.jl copied to clipboard
Approach to `alloc`
I'd love to have an alloc method in this package, but I'm not aware of any consensus on how this should be approached in Julia (besides Libc.malloc). Packages with related functionality include Blob.jl, StructIO.jl, and an unregistered ManualMemory.jl package. I'm sure there are others, along with plenty of relevant discussions on Zulip and in other issues.
I'm hoping we can converge on some well-defined goals here for functionality and possibly a path to implementation. So here are some goals I was thinking of:
- Better performance than
Libc.malloc. https://github.com/JuliaLang/julia/issues/24909#issuecomment-349795863 - Some form of GC
- Ability to map allocated memory to immutable types.
I'm not entirely sure what this will require. Other packages have plenty of code on how to map memory to julia types that could be built on top of, but I think the only way to allocate memory that the GC is aware of is to use alloca from LLVM, but I never was able to get that to return unique pointers.
Edit: Looks like there's an LLVM based implementation here.
I think GCs would have to be implemented at the compiler level.
Julia's GC isn't aware of/doesn't handle alloca, but IIRC Julia's compiler is what handles converting memory allocations into allocas when the memory doesn't escape.
I think "better performance than Libc.malloc for manually managing memory is reasonable, particular when we can adapt the allocator to specific problems / types of memory allocations.
E.g., it'd be easy to make one faster by only allowing allocating a certain size.
One fairly easy option I'd like is letting it allocate "stacks" we can use manually with a bump allocator. This would be great whenever you have a function that wants to use a lot of temporaries; it could request such an allocator, and then allocate all the temporaries on that.
One possible version is to allow a maximum of up to 128 be used at a time, with something like:
struct BumpAlloc
memory::Vector{Vector{UInt8}}
mask::Base.Threads.Atomic{UInt128}
BumpAlloc() = new([UInt8[] for _ in 1:128], Threads.Atomic{UInt128}(typemax(UInt128)))
end
The mask indicates whether each vector is used. leading_zeros gives the zero-based index of the first open slot. We can use atomics to allocate/free the vectors in memory to make sure parallel programs don't accidentally take the same one.
This should be relatively fast, and ideally each vector returned will be used for a large number of "bump allocations" within the program, i.e. each function call should try and use the same one for each data structure it is allocating.
They'd have to have a way to estimate how much memory they'll actually need, so that they can resize! the vector they receive if it is too small.
Alternatively, we could pick an upper size limit.
We could use Mmap to mmap a huge chunk of virtual memory. Only memory we actually tough gets mapped to actual hardware.
Then we should be able to set an upper size limit of a few gigs each. Or, better yet, set the size limit when constructing the BumpAlloc.
Allocating means setting the mask bit to 0 (and leading_zeros can be used to quickly find the index of an open slot).
Freeing means setting the associated mask bit back to 1. We'd have to thus keep track of that index.
Using this bump allocator itself is cheaper, allocating is incrementing the returned pointer, freeing is a no-op.
I'm all in favor of having more unique ways of allocating memory and BumpAlloc would be useful. I was also looking through some internals and thought jl_gc_alloc might be worth mentioning in case anyone is willing to get their hands dirty down the road.
Also we might have ImmutableArrays in base soonish: https://github.com/JuliaLang/julia/pull/41777.