dflib icon indicating copy to clipboard operation
dflib copied to clipboard

Transformation over a subset of rows

Open andrus opened this issue 6 years ago • 1 comments

Allow to map (or run whatever other type-specific ops we come up with per #77) on a subset of rows in a Series or DataFrame, and then merge the result back to the original Series / DataFrame.

So kind of like pandas loc, but over the immutable API.

So perhaps we use a builder that maintains rows selection before applying a chain of operations. A simple example (in reality there are more permutations of operation input and output, such as single column and DataFrame) :

// 1. finds rows where column "a" is > 3, 
// 2. for those multiplies "a" by 2
// 3. merges changes back to "a" (producing a new DataFrame of course)
df.loc("a", a -> a > 3).map(a -> a * 2).merge();

andrus avatar Sep 02 '19 05:09 andrus

Note that "over().partition().rank()" API (Window functions - #91) is very similar semantically (if not functionally). We can create a similar API here, that will either add extra columns, or will replace rectangular areas of the existing DataFrame.

andrus avatar Mar 01 '20 00:03 andrus