Deedle
Deedle copied to clipboard
More merging frames -
From Clement.
Suppose you have two frames like this:
a b a b
-------- ---------
x -> 1 2 x -> 5 6
y -> 3 4 y -> 7 8
It would be nice to do this:
a b
1 2 1 2
----------------
x -> 1 2 5 6
y -> 3 4 7 8
Or, you could do the same along the row axis as well ...
You way to achieve this might be via Frame.zip
and then a "flatten" in one direction...
Also, this suggests the more general idea of an zipN
that forms lists of aligned elements, as well as a way to flatten length-N collections.
Actually, this looks close:
let df1 = Frame.ofValues [("x","a", 1); ("y", "a", 3); ("x", "b", 2); ("y", "b", 4)] ;;
let df2 = Frame.ofValues [("x","a", 5); ("y", "a", 7); ("x", "b", 6); ("y", "b", 8)] ;;
df1 |> Frame.zip (fun x y -> x, y) df2 |> Frame.expandCols ["a"; "b"; "c"; "d"];;
But I notice that the argument of expandCols doesn't seem to be used (bug?)
Regarding the behavior of expandCols
, the parameter specifies the names of columns that you want to expand. So for example:
> df1 |> Frame.zip (fun x y -> x, y) df2 |> Frame.expandCols ["a"];;
val it : Frame<string,string> =
a.Item1 a.Item2 b
x -> 5 1 (6, 2)
y -> 7 3 (8, 4)
... but if this is not what you were expecting, then perhaps we need to improve the naming to make this obvious. Currently, there is expandAllCols
and expandCols
(which hopefully suggests what the argument means). I'd call the function you were (I think?) expecting expandAllColsAs
. But I'm open to better suggestions!
That said, I quite like the idea of having a function that aligns two frames and generates second-level int
index for overlaps! Do you (or Clement) have any suggestions for the naming?
In fact, we could have expandLevel
that does the same for expansion - the current use of First.Second.Third...
is based on the previous BM frame and is somewhat arbitrary. But expandLevel
could only expand one level at a time because the number of levels cannot be captured in the type system.
I understand expandCols
better now, thanks for the explanation! I filed a new "issue" that highlights either a bug or a misunderstanding.
I don't think we need to tack on a renaming function (ie, no need for expandAllColsAs
)
Maybe for the new function that I illustrated, we can call that Frame.interleave
?
We can almost accomplish it with current primatives:
Frame.zip (fun x y -> x, y) df1 df2 |> Frame.expandAllCols 1 |> [rename columns]
PS I'm not sure we need an expandLevel
... and I think the First
Second
... naming is actually Column1
Column2
etc. and only exists in our compatibility layer?
Another issue I was talking to Arseniy about today - we have ok support for hierarchical indexes in F# but it's relatively incompatible with C# as tuples are not equivalent in the two languages. Do you have any thoughts about how we might unify this? Could be a separate issue to open.
I created a new issue and this one is now just to add the implementation of Frame.interleave
as suggested.