Arrow.jl performance on 0.6, `reinterpret` and safety

performance on 0.6, `reinterpret` and safety

Open ExpandingMan opened this issue 7 years ago • 4 comments

To achieve full memory safety requires extensive use of reinterpret. On 0.6, it is impossible to reinterpret only a subset of an array. This causes Arrow.jl to always make an unnecessary copy, i.e. roughly what happens is reinterpret(T, A[i:j]). Since the A[i:j] can't be a view, it must be allocated. Therefore, we only get good performance in cases where the entire underlying data A is reinterpreted, i.e. no real cases.

There is no solution to this in 0.6. In 0.7, it will be possible to reinterpret views, and at that time we will have to investigate the performance of reinterpret.

So, my decision is to sacrifice safety and go back to pointers until 0.7.

Feb 22 '18 15:02 ExpandingMan

FWIW I agree with your decision to choose former (unsafe) implementation on 0.6 and use reinterpret on views starting with 0.7

Feb 22 '18 18:02 sglyon

Thanks, I value that feedback quite a lot because I was initially feeling quite unsure about it.

As I've done a little more research I think it's pretty clear that this is the only option. We already have problems with Missings performance at least until 0.7.

Feb 22 '18 18:02 ExpandingMan

Back to pointers as default. We use the unsafe_getvalue, unsafe_isnull and unsafe_construct methods.

All setindex! and setnull! methods are still safe and will probably remain so, as there aren't really any performance issues with them, as far as I know.

Feb 23 '18 14:02 ExpandingMan

Note to revisit this after Julia 1.0.1 thanks to Keno's fix.

Aug 18 '18 00:08 ExpandingMan

Arrow.jl Arrow.jl copied to clipboard

performance on 0.6, `reinterpret` and safety

Arrow.jl
Arrow.jl copied to clipboard