Deedle icon indicating copy to clipboard operation
Deedle copied to clipboard

Making it easier to map over a Frame

Open adamklein opened this issue 10 years ago • 2 comments

This just came up -

Suppose you have a Frame of floats (or anything really). We'd like to be able to map over all the values in the frame without having to decompose it to series and then recompose (eg, it's incumbent on the user to figure out to do this properly - doing Frame.cols or Frame.rows or Frame.mapRowValues etc... gives a series of ObjectSeries, which aren't quite amenable to doing typed operations ... unless you use .As<T>). Eg,

let stockReturn  = 
  stockPrices.ColumnKeys 
  |> Seq.map (fun k -> (k, stockPrices.GetSeries(k) |> log)) 
  |> Frame.ofColumns

Or

let stockReturn  = 
  stockPrices.GetAllSeries<float>()
  |> Seq.map (fun kv -> kv.Key, kv.Value |> log)
  |> Frame.ofColumns

Or, I guess

df |> Frame.mapRowValues (fun os -> os.As<float>() |> log)
    |> Frame.ofRows

Although this last requires all floats in the Frame.

The point is, there are too many choices ...

If there is heterogeneous data in the Frame, it seems to make sense either to drop non-convertible columns (I think we should check for type convertibility), or to pass them through - I think we should do the latter, as we've chosen in the past (but not always! eg Frame.sum).

Also I'd like users not to have to think about decomposing row-wise vs column-wise.

A whole bunch of other series operations would be useful as well to make easy at the frame level (say, by broadcasting across columns), such as

  • sampling/resampling/chunking/lookups
  • stats
  • taking/skipping
  • zipping

adamklein avatar Mar 14 '14 19:03 adamklein

+1 I have to do this when rounding a pivot table to 2 decimals. Unstacked series where all columns are same type seem a good case to cover; hetero frames may be the 20% case.

evilpepperman avatar Apr 30 '14 16:04 evilpepperman

Agreed!

I think having a map function that applies a specified transformation to all values would be quite useful. Ideally, we'd also allow this with the $ operator (which currently works on series).

The Frame module API should certainly be extended to support other series operations (column-wise). Although I think we already cover some of those @adamklein mentioned (but certainly not all).

tpetricek avatar May 02 '14 17:05 tpetricek