Query.jl icon indicating copy to clipboard operation
Query.jl copied to clipboard

Handy functions from dplyr

Open bramtayl opened this issue 7 years ago • 2 comments
trafficstars

Going through the dplyr manual, I see several functions that might add to query. These include sample, bind_rows, bind_cols, rename, mutate, slice, n, and top_n. I'm not sure if they are all necessary, but some of them might be nice and I could pitch in here.

bramtayl avatar Aug 24 '18 14:08 bramtayl

Oh and all the different joins too

bramtayl avatar Aug 24 '18 17:08 bramtayl

YES! I think that is actually the area where we could add the most value right now to Query.jl.

I have thought a lot about mutate and to some degree select, and not at all about the others. Here is my current thinking:

First, I think we should try to implement all the mutate and select variants in the front end only. I think it should be feasible that they all end up as @map calls under the hood, and in that way we actually don't have to add anything to QueryOperators.jl, or do any work on the backends.

Then, I think we could probably as a first step try to add features like that as new functions that manipulate NamedTuples, so that they can be used from within @map, before we start to add helper functions like @mutate and @select.

I think for starters, if we had a type stable merge function for NamedTuples, it would go a long way. Say merge((a=1,b=2),(c=3))==(a=1,b=2,c=3). Once we have that, we could add some syntax to {} to make it easier to use that. For example {a..., b..., x=3} could be translated to merge(a, b, (x=3,)) in the various Query.jl macros.

Another area would be selecting subsets of columns. We could either have something like startswith((foo1=1, bar=2, foo2=3), :foo)==(foo1=1,foo2=3), or something like (foo1=1, bar=2, foo2=3)[startswith(:foo)]==(foo1=1,foo2=3). I'm not sure which of these is better. In a query it might look like @map(startswith(_, :foo)) or @map(_[startswith(:foo)]). I think I like the first one better, but not sure... The second approach would be more in line with this, which probably would also be worthwhile... In general I think we need a lot more features to select columns, but we probably should iterate a bit with various designs?

Maybe as a first step I should create queryverse/NamedTupleHelpers.jl, where we could play with some of these methods, and where they could have their home?

davidanthoff avatar Sep 01 '18 05:09 davidanthoff