root icon indicating copy to clipboard operation
root copied to clipboard

Improve RArrowDS and introduce RCombinedDS

Open ktf opened this issue 6 years ago • 11 comments

ktf avatar Feb 18 '19 20:02 ktf

I've updated the PR with today's developments. Apart from the changes requested I've introduced a new index which allows performing all sort of block diagonal combinations between the indices of two tables, where each block represent a category of objects, e.g. tracks belonging to the same event. This is still not fully tested, but it should give an idea of the possibilities.

ktf avatar Feb 20 '19 22:02 ktf

OK let's get this merged. We need two things:

  • find a better name than "combined". I suggest RCrossJoinDS or RCartesianProductDS.
  • fix the conflict. I can certainly do that myself.

@ktf let me know, please!

Axel-Naumann avatar Nov 26 '19 15:11 Axel-Naumann

Notice this is not limited to provide Cross Join / Cartesian Products kind of joins (i.e. MxN) but it's fully configurable via the index, that's why I chose "Combined". In principle one could have an index which repeats ten times the same pair of "rows" or whatever. RAssociativeDS?

ktf avatar Nov 27 '19 11:11 ktf

See also thread we had with @pcanal at https://github.com/root-project/root/pull/3443/files#r258137063

ktf avatar Nov 27 '19 11:11 ktf

Hi, I'm picking this up (thanks @linev for assigning me).

#3433 should be merged first. It just requires rebasing and fixing a trivial conflict.

For this PR: I will review the code one last time before approving, mostly to be familiar with the code myself. I don't expect any issue. I think RCombinedDS is not a bad name given that it really provides arbitrary combinations of datasources. What do you think @Axel-Naumann ? @ktf proposed RAssociativeDS as an alternative but I don't think it's clearer.

eguiraud avatar Mar 04 '20 11:03 eguiraud

Discussing with @ktf, we decided to add RCombinedDS as a friend of RDataSource to avoid the #define protected public that is now required to call GetColumnReadersImpl from RCombinedDS. I will do that and resolve the few conflicts this PR now has before merging.

I don't think the other comments still apply (right @pcanal ?)

eguiraud avatar Sep 07 '20 12:09 eguiraud

...thank you @phsft-bot

eguiraud avatar Sep 07 '20 12:09 eguiraud

@pcanal also needs a couple more things, see my last comment. It's on my to do list.

eguiraud avatar Sep 11 '20 17:09 eguiraud

@ktf, what should we do with this PR? Should we closed it?

guitargeek avatar Apr 08 '24 01:04 guitargeek

Up to you what you want to do. We have decided to go a different way than RDataFrame for our analysis framework, so it's not critical for us. Still I think some way of combining RDataSources with JOIN like operations is probably something useful.

ktf avatar Apr 11 '24 10:04 ktf

Ok, thanks for the clarification! That's true, the RCombinedDS is nice. I guess the RDataFrame devs should take your PR from here. But right now we don't enable arrow in the CI. Let's first do that @martamaja10 @vepadulano @dpiparo

guitargeek avatar May 09 '24 14:05 guitargeek