Arrow.jl
Arrow.jl copied to clipboard
Struct
I would like to take a shot at this one if you don’t mind. Just opening an issue to declare my intent to start on this hopefully in the next few days.
Great, thanks!
I don't know what you have planned, and I haven't really given this any thought yet, but one thing I'd like to recommend is that you consider that named tuples might be useful. These will be a core part of the language starting with 0.7. I don't actually know for sure whether they are relevant here, but it's a thought.
I was thinking more along the lines of any Julia struct.
Though I haven’t fully considered how that would work exactly.
Definitely yes, we'd ultimately want the ability to store any Julia struct.
One thing to keep in mind though: to get good performance you are going to have to make sure that you are not doing anything that triggers "extra" compilation. By this I mean calling a function with values that can only be determined at run-time that calls a macro that defines a Julia struct
or something like that. One place where this might be unavoidable, and a really cool feature, is having it automatically define a Julia struct for you when you deserialize an Arrow struct. But in general, things like this should be avoided.
Hi - how much work do you think needs to be done in order to read arrow structs? e.g. a typical "dataframe" struct with each entry's values being a vector of the same length.
@kcajf sorry I had missed your comment all that time ago. It's a little hard for me to guess how much effort that would be right now. I'm probably going to look into IPC first, and I think maybe implementin structs will be a necessary part of that.
No problem. Reading the streaming IPC is actually what I was trying to do above - I've since read up an arrow enough to know the right terminology!
An aside: there is an issue on this project discussing plans for merging into the official apache project. Has there been any further discussion on other channels about whether / how that would happen?