Deedle icon indicating copy to clipboard operation
Deedle copied to clipboard

More merging frames -

Open adamklein opened this issue 10 years ago • 5 comments

From Clement.

Suppose you have two frames like this:

     a    b               a    b
     --------             ---------
x -> 1    2          x -> 5    6
y -> 3    4          y -> 7    8

It would be nice to do this:

     a         b
     1    2    1    2
     ----------------
x -> 1    2    5    6
y -> 3    4    7    8

Or, you could do the same along the row axis as well ...

You way to achieve this might be via Frame.zip and then a "flatten" in one direction...

Also, this suggests the more general idea of an zipN that forms lists of aligned elements, as well as a way to flatten length-N collections.

adamklein avatar Mar 28 '14 17:03 adamklein

Actually, this looks close:

let df1 = Frame.ofValues [("x","a", 1); ("y", "a", 3); ("x", "b", 2); ("y", "b", 4)] ;;
let df2 = Frame.ofValues [("x","a", 5); ("y", "a", 7); ("x", "b", 6); ("y", "b", 8)] ;;
df1 |> Frame.zip (fun x y -> x, y) df2 |> Frame.expandCols ["a"; "b"; "c"; "d"];;

But I notice that the argument of expandCols doesn't seem to be used (bug?)

adamklein avatar Mar 28 '14 17:03 adamklein

Regarding the behavior of expandCols, the parameter specifies the names of columns that you want to expand. So for example:

> df1 |> Frame.zip (fun x y -> x, y) df2 |> Frame.expandCols ["a"];;

val it : Frame<string,string> =
     a.Item1 a.Item2 b      
x -> 5       1       (6, 2) 
y -> 7       3       (8, 4) 

... but if this is not what you were expecting, then perhaps we need to improve the naming to make this obvious. Currently, there is expandAllCols and expandCols (which hopefully suggests what the argument means). I'd call the function you were (I think?) expecting expandAllColsAs. But I'm open to better suggestions!

That said, I quite like the idea of having a function that aligns two frames and generates second-level int index for overlaps! Do you (or Clement) have any suggestions for the naming?

In fact, we could have expandLevel that does the same for expansion - the current use of First.Second.Third... is based on the previous BM frame and is somewhat arbitrary. But expandLevel could only expand one level at a time because the number of levels cannot be captured in the type system.

tpetricek avatar Mar 29 '14 23:03 tpetricek

I understand expandCols better now, thanks for the explanation! I filed a new "issue" that highlights either a bug or a misunderstanding.

I don't think we need to tack on a renaming function (ie, no need for expandAllColsAs)

Maybe for the new function that I illustrated, we can call that Frame.interleave ?

We can almost accomplish it with current primatives:

Frame.zip (fun x y -> x, y) df1 df2 |> Frame.expandAllCols 1 |> [rename columns]

adamklein avatar Apr 03 '14 22:04 adamklein

PS I'm not sure we need an expandLevel ... and I think the First Second ... naming is actually Column1 Column2 etc. and only exists in our compatibility layer?

Another issue I was talking to Arseniy about today - we have ok support for hierarchical indexes in F# but it's relatively incompatible with C# as tuples are not equivalent in the two languages. Do you have any thoughts about how we might unify this? Could be a separate issue to open.

adamklein avatar Apr 03 '14 23:04 adamklein

I created a new issue and this one is now just to add the implementation of Frame.interleave as suggested.

tpetricek avatar May 21 '14 23:05 tpetricek