Deedle
Deedle copied to clipboard
Making it easier to map over a Frame
This just came up -
Suppose you have a Frame of floats (or anything really). We'd like to be able to map over all the values in the frame without having to decompose it to series and then recompose (eg, it's incumbent on the user to figure out to do this properly - doing Frame.cols or Frame.rows or Frame.mapRowValues etc... gives a series of ObjectSeries, which aren't quite amenable to doing typed operations ... unless you use .As<T>
). Eg,
let stockReturn =
stockPrices.ColumnKeys
|> Seq.map (fun k -> (k, stockPrices.GetSeries(k) |> log))
|> Frame.ofColumns
Or
let stockReturn =
stockPrices.GetAllSeries<float>()
|> Seq.map (fun kv -> kv.Key, kv.Value |> log)
|> Frame.ofColumns
Or, I guess
df |> Frame.mapRowValues (fun os -> os.As<float>() |> log)
|> Frame.ofRows
Although this last requires all floats in the Frame.
The point is, there are too many choices ...
If there is heterogeneous data in the Frame, it seems to make sense either to drop non-convertible columns (I think we should check for type convertibility), or to pass them through - I think we should do the latter, as we've chosen in the past (but not always! eg Frame.sum
).
Also I'd like users not to have to think about decomposing row-wise vs column-wise.
A whole bunch of other series operations would be useful as well to make easy at the frame level (say, by broadcasting across columns), such as
- sampling/resampling/chunking/lookups
- stats
- taking/skipping
- zipping
+1 I have to do this when rounding a pivot table to 2 decimals. Unstacked series where all columns are same type seem a good case to cover; hetero frames may be the 20% case.
Agreed!
I think having a map function that applies a specified transformation to all values would be quite useful. Ideally, we'd also allow this with the $
operator (which currently works on series).
The Frame
module API should certainly be extended to support other series operations (column-wise). Although I think we already cover some of those @adamklein mentioned (but certainly not all).