dh-core icon indicating copy to clipboard operation
dh-core copied to clipboard

analyze: add usage example(s)

Open ocramz opened this issue 7 years ago • 8 comments

Possibly a binary in the app/ folder with an end-to-end workflow. Then we can split back anything good that comes out of this into the main library

ocramz avatar Oct 30 '18 17:10 ocramz

One possible use case (from https://www.reddit.com/r/haskell/comments/a50xpr/datahaskell_solve_this_small_problem_to_fill_some/ )

The problem

Averaged across persons, excluding legal fees, how much money had each person spent by time 6?

item , price 
----------
computer , 1000 
car , 5000 
legal fees (1 hour) , 400
date , person , item-bought , units-bought 
------------------------------------
7 , bob , car , 1 
5 , alice , car , 1 
4 , bob , legal fees (1 hour) , 20 
3 , alice , computer , 2 
1 , bob , computer , 1 

It would be extra cool if you provided both an in-memory and a streaming solution.

Principles|operations it illustrates

Predicate-based indexing|filtering. Merging (called "joining" in SQL). Within- and across-group operations. Sorting. Accumulation (what Data.List calls "scanning"). Projection (both the "last row" and the "mean" operations). Statistics (the "mean" operation).

Solution and proposed algorithm (it's possible you don't want to read this)

The answer is $4000. That's because by time 6, Bob had bought 1 computer ($1000) and 20 hours of legal work (excluded), while Alice had bought a car ($5000) and two computers ($2000). In total they had spent $8000, so the across-persons average is $4000.

One way to compute that would be to:

  • Delete any purchase of legal fees.
  • Merge price and purchase data.
  • Compute a new column, "money-spent" = units-bought price.
  • Group by person.
  • Within each group: Sort by date in increasing order.
  • Compute a new column, "accumulated-spending" = running total of money spent.
  • Keep the last row with a date no greater than 6; drop all others.
  • Across groups, compute the mean of accumulated spending.

ocramz avatar Dec 12 '18 08:12 ocramz

Started addressing this with some generic conversion machinery in #34

ocramz avatar Jan 13 '19 13:01 ocramz

Currently writing an example, will commit soon

UnkDevE avatar Feb 12 '19 11:02 UnkDevE

writtern code! don't know how to pull request however

UnkDevE avatar Feb 13 '19 00:02 UnkDevE

@UnkDevE you open a PR starting from the page with your fork, then clicking "Compare" to see your changes in context :

image

https://github.com/DataHaskell/dh-core/compare/master...UnkDevE:master

then you can press "Create pull request"

ocramz avatar Feb 13 '19 12:02 ocramz

Thanks! made pull request.

UnkDevE avatar Feb 13 '19 19:02 UnkDevE

@UnkDevE I was too quick in merging your previous PR; a number of things still needed to be fixed. For the future, could you add your tests to the main test group, so that Travis runs them together and we see if anything is broken? Thanks!

ocramz avatar Mar 02 '19 20:03 ocramz

no problem! will get started on that tomorrow

UnkDevE avatar Mar 02 '19 22:03 UnkDevE