Frames
Frames copied to clipboard
Adding a derived column
Is it possible to add a derived columns to the frame? I see that there is a frameCons that can add a column to the frame but I am not sure how to go about defining a type signature for the new Column as all the other types where generated by TH and I have no idea how that works.
For example I have a table with number of girls and boys enrolled each year at primary school and I want to add a derived column for the total number of kids enrolled each year.
Adding a derived column is a fairly common operation in R so being able to duplicate it easily would be a plus.
Thanks Riaan
I wrote up how to do this, but then added some things to hopefully make it easier and pushed a new version of Frames
to hackage. In general, I use the :i
command quite a lot in cabal repl
to inspect the row type I get from some data. You can also use :browse
to see all the generated things.
So here's how you might approach your example problem given the new Frames-0.1.1.0
:
{-# LANGUAGE DataKinds, FlexibleContexts, TemplateHaskell, TypeOperators #-}
import Frames
import Lens.Family
tableTypes "Row" "data/SchoolEnrollment.csv"
loadRows :: IO (Frame Row)
loadRows = inCoreAoS $ readTable "data/SchoolEnrollment.csv"
totalEnrollment :: Frame Row
-> Frame (Record ("Total" :-> Int ': RecordColumns Row))
totalEnrollment = fmap (\r -> frameConsA (r^.girls + r^.boys) r)
type Row' = Record ("Total" :-> Int ': RecordColumns Row)
I put the Row'
definition there at the end as you will want to name that new row type if you refer to it more than once. Given that synonym, we'd have totalEnrollment :: Frame Row -> Frame Row'
.
That looks awesome, I tried building it though and get the following error:
src/Frames/RecF.hs:37:15:
Not in scope: type constructor or class ‘Applicative’
Perhaps you meant ‘RecApplicative’ (imported from Data.Vinyl)
src/Frames/RecF.hs:38:34:
Not in scope: ‘pure’
Perhaps you meant ‘rpure’ (imported from Data.Vinyl)
cabal: Error: some packages failed to install:
Frames-0.1.1.0 failed during the building phase. The exception was:
ExitFailure 1
I think you meant to add (Applicative, pure) to the imports for RecF. Works if I edit source code to add it.
Works great, and really nice that the other functions is not broken by the additional column. Would it be possible to also add an Applicative version of frameSnoc so that the derived columns can be appended to the end?
Thanks for the quick response.
Sorry about my screwup with the missing Applicative
import. I use GHC 7.10 and didn't wait for Travis to build on 7.8. Frames-0.1.1.1
is up on hackage fixing that.
Sure, we can add frameSnocA
but I had some trouble with the type when looking at your example here. The problem is that writing frameSnoc
requires that the caller applies the Col
newtype
constructor (this is what tags each column type with a name).
The problem is that the type level ++
leads to ambiguity. It's worth playing with a bit more to see if this can be improved, but I didn't find a solution in my quick look. Maybe the injective type family features coming to GHC will be able to help there, or maybe carefully annotating the operation can help.
No problem, I forced me to look at what you have changed and hopefully increased my haskell knowledge a little bit.
W.r.t. frameSnocA
I won't pretend to understand the problem. My haskell type foo stops at understanding Monads. But reading between the lines it sounds like possible but not straightforward.
I can work around it for now by using fmap (select [pr|Boys,Girls,Total|]) totalEnrollment
and/or constructing the type signature by hand but this is means that I cannot really be agnostic to the full data structure, still in most cases where I am creating a derived column I have a fairly clear idea of the fields I am considering so it is not too bad.
I did get the feeling though that is was getting a bit slower doing it this way but maybe this is of the same order as snoc? I am on toy data at the moment but the real data structures will be much larger so if this workaround comes with a heavy penalty compared to snoc then it would be better to investigate other options, including just leaving all the derived columns at the start :-)
Thanks anyway for your help and all the work you put into this library. I am enjoying the challenge of trying to understand how it works.
Ah, so that's another direction that I de-emphasized here in order to make the the row type change clear. You can also write,
te :: (Boys ∈ rs, Girls ∈ rs) => FrameRec rs -> FrameRec ("Total" :-> Int ': rs)
te = fmap (\r -> frameConsA (r^.girls + r^.boys) r)
And now you are agnostic to the rest of the row.
The cons'ing onto one end of the list is annoying, but I'm sure if everything were built around a snoc-list structure we'd have symmetric annoyances. This aspect of the design is inherited from Vinyl
, but the cons list is primary in FP languages, so it's the data structure of least surprise.
It's worth thinking about if this can be improved in Frames
. Maybe it's a user experience thing, and just letting the programmer write the types as a snoc list would be better. I'm not sure if that can be done, as it may well run into more problems of not being able to teach the type checker facts about lists. I'll try to find some time to play around with that to see what works out; we could probably offer something like FrameR (Row ':| "Total" :-> Int)
, but if we wanted the row to print out in reverse order, it'd have to be a newtype rather than just a type family.
I would want the row to index and print out like Total is the last column. As this is the way R presents dependent columns it would be the most intuitive I think.
Thanks for clarifying how to make the function more generic. I was just thinking agnostic to the other columns in the table that are generated from the csv file. Will te
also work accross other structures that have the columns ("boys" :-> Int)
and ("girls" :-> Int)
?
By the way, how do you get the ∈ symbol in haskell? I have to copy and paste it every time because I have no idea how to type it. I use emacs as editor.
I ran into another problem with the derived columns, when I try to get the data from the derived column using view I get a not in scope
error.
How do I access this column?
- Yes, the
te
example I gave works with any structure that hasBoys
andGirls
columns. - Regarding
∈
, you can also write out an application ofRElem
. I enter the symbol in emacs by runningtoggle-input-method
(C-\
), and enabling theTeX
input method. This lets you write TeX commands such as\in
to produce symbols. - Since we're creating a new column, no code has been generated. I added the following to our running program as an example,
type Total = "Total" :-> Int
total :: (Total ∈ rs) => Functor f => LensLike' f (Record rs) Int
total = rlens [pr|Total|]
getTotal :: (Total ∈ rs) => Record rs -> Int
getTotal = rget total
Awesome, all working now, still a bit of boilerplate to shuffle the columns around but not too bad. Great tip on the input method as well.
Thanks