Arrow.jl
Arrow.jl copied to clipboard
performance on 0.6, `reinterpret` and safety
To achieve full memory safety requires extensive use of reinterpret
. On 0.6, it is impossible to reinterpret
only a subset of an array. This causes Arrow.jl to always make an unnecessary copy, i.e. roughly what happens is reinterpret(T, A[i:j])
. Since the A[i:j]
can't be a view, it must be allocated. Therefore, we only get good performance in cases where the entire underlying data A
is reinterpreted, i.e. no real cases.
There is no solution to this in 0.6. In 0.7, it will be possible to reinterpret
views, and at that time we will have to investigate the performance of reinterpret
.
So, my decision is to sacrifice safety and go back to pointers until 0.7.
FWIW I agree with your decision to choose former (unsafe) implementation on 0.6 and use reinterpret on views starting with 0.7
Thanks, I value that feedback quite a lot because I was initially feeling quite unsure about it.
As I've done a little more research I think it's pretty clear that this is the only option. We already have problems with Missings
performance at least until 0.7.
Back to pointers as default. We use the unsafe_getvalue
, unsafe_isnull
and unsafe_construct
methods.
All setindex!
and setnull!
methods are still safe and will probably remain so, as there aren't really any performance issues with them, as far as I know.
Note to revisit this after Julia 1.0.1 thanks to Keno's fix.