Chris Elrod
Chris Elrod
Interesting, I hadn't seen that issue before. I assume you saw the short term solution: https://github.com/tpapp/PushVectors.jl I'm curious if that'd help LoopVectorization's compile time performance (it uses a lot of...
Which type info? Just the element type, or everything, including size and strides? This is what I get on an M1 Mac: ```julia julia> using LoopVectorization, StaticArrays julia> x =...
Memory management is pretty central to performance, so it'd be great if we can find a way to make optimizations easy. As a brief aside, `StrideArrays.jl` is implemented as: 1....
> I was trying to solve this with preserve(f, data) that allows different garbage collection and pointer constructors per type and device Yeah, we could define specific overloads for `preserve`....
For a first step to start experimenting with it and checking performance, it'd be nice to have a way to distinguish between when we should take approach 1 vs 2,...
> I still need to put more thought into how we get from instantiate(f, data) to the fully defined function. If we can get a size and eltype `T`, we...
I'll generalize `contiguous_axis` and `contiguous_batch_size` by letting them return tuples to represent block arrays.
The more interesting change will be having this work with `ArrayInterface.getindex` and `ArrayInterface.setindex!`. I haven't tried if `contiguous_axis` and `contiguous_batch_size` are actually supported yet. This will be a breaking change,...
I need to walk back my earlier comment. I'd want equally sized blocks. I think `BlockArray(rand(4, 4), [2,2], [1,1,2])` is out of scope of at least what `contiguous_axis` and `contiguous_batch_size`...
The last block can be shorter.