root
root copied to clipboard
Improve RArrowDS and introduce RCombinedDS
I've updated the PR with today's developments. Apart from the changes requested I've introduced a new index which allows performing all sort of block diagonal combinations between the indices of two tables, where each block represent a category of objects, e.g. tracks belonging to the same event. This is still not fully tested, but it should give an idea of the possibilities.
OK let's get this merged. We need two things:
- find a better name than "combined". I suggest
RCrossJoinDSorRCartesianProductDS. - fix the conflict. I can certainly do that myself.
@ktf let me know, please!
Notice this is not limited to provide Cross Join / Cartesian Products kind of joins (i.e. MxN) but it's fully configurable via the index, that's why I chose "Combined". In principle one could have an index which repeats ten times the same pair of "rows" or whatever. RAssociativeDS?
See also thread we had with @pcanal at https://github.com/root-project/root/pull/3443/files#r258137063
Hi, I'm picking this up (thanks @linev for assigning me).
#3433 should be merged first. It just requires rebasing and fixing a trivial conflict.
For this PR: I will review the code one last time before approving, mostly to be familiar with the code myself. I don't expect any issue. I think RCombinedDS is not a bad name given that it really provides arbitrary combinations of datasources. What do you think @Axel-Naumann ? @ktf proposed RAssociativeDS as an alternative but I don't think it's clearer.
Discussing with @ktf, we decided to add RCombinedDS as a friend of RDataSource to avoid the #define protected public that is now required to call GetColumnReadersImpl from RCombinedDS. I will do that and resolve the few conflicts this PR now has before merging.
I don't think the other comments still apply (right @pcanal ?)
...thank you @phsft-bot
@pcanal also needs a couple more things, see my last comment. It's on my to do list.
@ktf, what should we do with this PR? Should we closed it?
Up to you what you want to do. We have decided to go a different way than RDataFrame for our analysis framework, so it's not critical for us. Still I think some way of combining RDataSources with JOIN like operations is probably something useful.
Ok, thanks for the clarification! That's true, the RCombinedDS is nice. I guess the RDataFrame devs should take your PR from here. But right now we don't enable arrow in the CI. Let's first do that @martamaja10 @vepadulano @dpiparo