dataframe
dataframe copied to clipboard
A dataframe library for Scheme (R6RS)
Scheme (R6RS) Dataframe Library
A dataframe record type with procedures to select, drop, and rename columns, and filter, sort, split, bind, append, join, reshape, and aggregate dataframes.
Related blog posts:
A dataframe record type for Scheme
Select, drop, and rename dataframe columns in Scheme
Split, bind, and append dataframes in Scheme
Filter, partition, and sort dataframes in Scheme
Modify and aggregate dataframes in Scheme
Installation
Akku
$ akku install dataframe
For more information on getting started with Akku, see this blog post.
Import
(import (dataframe))
Table of Contents
Type conversion
(get-type obj)
(guess-type lst n-max)
(convert-type obj type)
Series record type
(make-series name lst)
(make-series* expr)
(series? series)
(series-name series)
(series-lst series)
(series-length series)
(series-type series)
(series-equal? series1 series2 ...)
Dataframe record type
(make-dataframe slist)
(make-df* expr)
(dataframe-slist df)
(dataframe-names df)
(dataframe-dim df)
(dataframe-contains? df name ...)
(dataframe-head df n)
(dataframe-tail df n)
(dataframe-equal? df1 df2 ...)
(dataframe-ref df indices [name ...])
(dataframe-series df name)
(dataframe-values df name)
Dataframe display
(dataframe-display df [n total-width min-width])
(dataframe-glimpse df [total-width])
Dataframe read/write
(dataframe-write df path [overwrite])
(dataframe-read path)
(dataframe->csv df path [overwrite])
(dataframe->tsv df path [overwrite])
(csv->dataframe path [header])
(tsv->dataframe path [header])
Select, drop, and rename columns
(dataframe-select df names)
(dataframe-select* df name ...)
(dataframe-drop df names)
(dataframe-drop* df name ...)
(dataframe-rename df old-names new-names)
(dataframe-rename* df (old-name new-name) ...)
(dataframe-rename-all df new-names)
Filter
(dataframe-unique df)
(dataframe-filter df names procedure)
(dataframe-filter* df names expr)
(dataframe-filter-at df predicate name ...)
(dataframe-filter-all df predicate)
(dataframe-partition df names procedure)
(dataframe-partition* df names expr)
Sort
(dataframe-sort df predicates names)
(dataframe-sort* df (predicate name) ...)
Split, bind, and append
(dataframe-split df group-name ...)
(dataframe-bind df1 df2 [fill-value])
(dataframe-bind-all dfs [fill-value])
(dataframe-append df1 df2 ...)
Crossing
(dataframe-crossing obj1 obj2 ...)
Join
(dataframe-inner-join df1 df2 join-names)
(dataframe-left-join df1 df2 join-names [fill-value])
(dataframe-left-join-all dfs join-names [fill-value])
Reshape
(dataframe-stack df names names-to values-to)
(dataframe-spread df names-from values-from [fill-value])
Modify and aggregate
(dataframe-modify df new-names names procedure ...)
(dataframe-modify* df (new-name names expr) ...)
(dataframe-modify-at df procedure name ...)
(dataframe-modify-all df procedure)
(dataframe-aggregate df group-names new-names names procedure ...)
(dataframe-aggregate* df group-names (new-name names expr) ...)
Thread first and thread last
(-> expr ...)
(->> expr ...)
Missing values
(na? obj)
(any-na? lst)
(remove-na lst)
(dataframe-remove-na df [name ...])
Descriptive statistics
(count obj lst)
(count-elements lst)
(rle lst)
(remove-duplicates lst)
(rep lst n type)
(tranpose lst)
(sum lst [na-rm])
(product lst [na-rm])
(mean lst [na-rm])
(weighted-mean lst weights [na-rm])
(variance lst [na-rm])
(standard-deviation lst [na-rm])
(median lst [type na-rm])
(quantile lst p [type na-rm])
(interquartile-range lst [type na-rm])
(cumulative-sum lst)
Type conversion
procedure: (get-type obj)
returns: type of obj (bool, chr, str, sym, num, or other); strings that are valid numbers are assumed to be 'num
procedure: (guess-type lst n-max)
returns: type of elements in lst (bool, chr, str, sym, num, or other); evaluates up to n-max elements of lst before guessing; strings that are valid numbers are assumed to be 'num
> (get-type "3")
num
> (get-type '(1 2 3))
other
> (guess-type '(1 2 3) 3)
num
> (guess-type '(1 "2" 3) 3)
num
> (guess-type '(a b c) 3)
sym
> (guess-type '(a b "c") 3)
str
> (guess-type '(a b "c") 2)
sym
procedure: (convert-type obj type)
returns: an obj converted to type; elements that can't be converted to type are replaced with 'na
;; arguably, this is overly opinionated, but was chosen to avoid surprise about things like
;; (string->symbol "10") --> \x31;0
> (convert-type "c" 'sym)
na
> (convert-type 'b 'str)
"b"
> (map (lambda (x) (convert-type x 'other)) '(a b "c"))
(a b "c")
> (convert-type "3" 'num)
3
> (map (lambda (x) (convert-type x 'num)) '(1 2 3 na "" " " "NA" "na"))
(1 2 3 na na na na na)
> (map (lambda (x) (convert-type x 'str)) '(a "b" c na "" " " "NA" "na"))
("a" "b" "c" na na na na na)
Series record type
procedure: (make-series name lst)
returns: a series record type from name and lst with four fields: name, lst, length, and type
procedure: (make-series* expr)
returns: a series record type from expr with four fields: name, lst, length, and type
> (make-series 'a '(1 2 3))
#[#{series oti45h148lm5x6fghpw1qhjz-20} a (1 2 3) (1 2 3) num 3]
> (make-series* (a 1 2 3))
#[#{series oti45h148lm5x6fghpw1qhjz-20} a (1 2 3) (1 2 3) num 3]
> (make-series 'a '(a b c))
#[#{series oti45h148lm5x6fghpw1qhjz-20} a (a b c) (a b c) sym 3]
> (make-series* (a 'a 'b 'c))
#[#{series oti45h148lm5x6fghpw1qhjz-20} a (a b c) (a b c) sym 3]
procedure: (series? series)
returns: #t if series is a series, #f otherwise
procedure: (series-name series)
returns: series name
procedure: (series-lst series)
returns: series list
procedure: (series-length series)
returns: series length
> (define s (make-series 'a (iota 10)))
> (series-name s)
a
> (series-length s)
10
> (series-lst s)
(0 1 2 3 4 5 6 7 8 9)
procedure: (series-type series)
returns: series type (bool, chr, str, sym, num, or other); implicit conversion rules are applied in make-series*
> (series-type (make-series* (a 1 2 3)))
num
> (series-type (make-series* (a 1 "2" 3)))
num
> (series-type (make-series* (a 1 "b" 3)))
str
> (series-type (make-series* (a "a" "b" "c")))
str
> (series-type (make-series* (a 'a 'b 'c)))
sym
> (series-type (make-series* (a 'a 'b "c")))
str
> (series-type (make-series* (a #t #f)))
bool
> (series-type (make-series* (a #t "#f")))
str
> (series-type (make-series* (a #\a #\b #\c)))
chr
> (series-type (make-series* (a #\a #\b "c")))
str
> (series-type (make-series* (a 1 2 '(3 4))))
other
procedure: (series-equal? series1 series2 ...)
returns: #t if all series are equal, #f otherwise
> (series-equal?
(make-series* (a 1 2 3))
(make-series* (a 1 "2" 3)))
#t
> (series-equal?
(make-series* (a "a" "b" "c"))
(make-series* (a 'a 'b "c")))
#t
> (series-equal?
(make-series* (a "a" "b" "c"))
(make-series* (a 'a 'b 'c)))
#f
> (series-equal?
(make-series* (a 1 2 3))
(make-series* (a 1 "2" 3))
(make-series* (b 1 2 3)))
#f
Dataframe record type
procedure: (make-dataframe slist)
returns: a dataframe record type from a list of series (slist) with three fields: slist, names, and dim
procedure: (make-df* expr)
returns: a dataframe record type from expr with three fields: slist, names, and dim
> (make-dataframe (list (make-series* (a 1 2 3)) (make-series* (b 4 5 6))))
#[#{dataframe mcq0csmab1sjwlyjv093af7t1-20} (#[#{series mcq0csmab1sjwlyjv093af7t1-21} a (1 2 3) (1 2 3) num 3] #[#{series mcq0csmab1sjwlyjv093af7t1-21} b (4 5 6) (4 5 6) num 3]) (a b) (3 . 2)]
> (make-df* (a 1 2 3) (b 4 5 6))
#[#{dataframe mcq0csmab1sjwlyjv093af7t1-20} (#[#{series mcq0csmab1sjwlyjv093af7t1-21} a (1 2 3) (1 2 3) num 3] #[#{series mcq0csmab1sjwlyjv093af7t1-21} b (4 5 6) (4 5 6) num 3]) (a b) (3 . 2)]
> (dataframe? (make-df* (a 1 2 3)))
#t
> (dataframe? (list (make-series* (a 1 2 3))))
#f
> (make-df* ("a" 1 2 3))
Exception in (make-series name src): name(s) not symbol(s)
procedure: (dataframe-slist df)
returns: a list of the series that comprise dataframe df
> (dataframe-slist (make-df* (a 1 2 3) (b 4 5 6)))
(#[#{series cr52mzjx42dc7eg7ul2sn36zu-20} a (1 2 3) (1 2 3) num 3]
#[#{series cr52mzjx42dc7eg7ul2sn36zu-20} b (4 5 6) (4 5 6) num 3])
procedure: (dataframe-names df)
returns: a list of symbols representing the names of columns in dataframe df
> (dataframe-names (make-df* (a 1) (b 2) (c 3) (d 4)))
(a b c d)
procedure: (dataframe-dim df)
returns: a pair of the number of rows and columns (rows . columns) in dataframe df
> (dataframe-dim (make-df* (a 1) (b 2) (c 3) (d 4)))
(1 . 4)
> (dataframe-dim (make-df* (a 1 2 3) (b 4 5 6)))
(3 . 2)
procedure: (dataframe-contains? df name ...)
returns: #t if all column names are found in dataframe df, #f otherwise
> (define df (make-df* (a 1) (b 2) (c 3) (d 4)))
> (dataframe-contains? df 'a 'c 'd)
#t
> (dataframe-contains? df 'b 'e)
#f
procedure: (dataframe-head df n)
returns: a dataframe with first n rows from dataframe df
procedure: (dataframe-tail df n)
returns: a dataframe with the nth tail (zero-based) rows from dataframe df
> (define df (make-df* (a 1 2 3 1 2 3) (b 4 5 6 4 5 6) (c 7 8 9 -999 -999 -999)))
> (dataframe-display (dataframe-head df 3))
dim: 3 rows x 3 cols
a b c
<num> <num> <num>
1. 4. 7.
2. 5. 8.
3. 6. 9.
> (dataframe-display (dataframe-tail df 2))
dim: 4 rows x 3 cols
a b c
<num> <num> <num>
3. 6. 9.
1. 4. -999.
2. 5. -999.
3. 6. -999.
procedure: (dataframe-equal? df1 df2 ...)
returns: #t if all dataframes are equal, #f otherwise
> (dataframe-equal? (make-df* (a 1 2 3))
(make-df* (a 1 "2" 3)))
#t
> (dataframe-equal? (make-df* (a 1 2 3) (b 4 5 6))
(make-df* (b 4 5 6) (a 1 2 3)))
#f
> (dataframe-equal? (make-df* (a 1 2 3) (b 4 5 6))
(make-df* (a 10 2 3) (b 4 5 6)))
#f
procedure: (dataframe-ref df indices [name ...])
returns: a dataframe with only rows indicated by indices from dataframe df; default is to return all columns, but can optionally specify column name(s)
> (define df (make-df* (a 100 200 300) (b 4 5 6) (c 700 800 900)))
> (dataframe-display df)
dim: 3 rows x 3 cols
a b c
<num> <num> <num>
100. 4. 700.
200. 5. 800.
300. 6. 900.
> (dataframe-display (dataframe-ref df '(0 2)))
dim: 2 rows x 3 cols
a b c
<num> <num> <num>
100. 4. 700.
300. 6. 900.
> (dataframe-display (dataframe-ref df '(0 2) 'a 'c))
dim: 2 rows x 2 cols
a c
<num> <num>
100. 700.
300. 900.
procedure: (dataframe-series df name)
returns: a series for column name from dataframe df
procedure: (dataframe-values df name)
returns: a list of values for column name from dataframe df
> (define df (make-df* (a 100 200 300) (b 4 5 6) (c 700 800 900)))
> (dataframe-series df 'b)
#[#{series ey38a8jsdkhs5t8j9gl1fo67w-59} b (4 5 6) (4 5 6) num 3]
> (dataframe-values df 'b)
(4 5 6)
> ($ df 'b) ; $ is shorthand for dataframe-values; inspired by R, e.g., df$b.
(4 5 6)
> (map (lambda (name) ($ df name)) '(c a))
((700 800 900) (100 200 300))
Dataframe display
procedure: (dataframe-display df [n total-width min-width])
displays: the dataframe df up to n rows and the number of columns that fit in total-width based on the actual contents of column or minimum column width min-width; total-width and min-width are measured in number of characters; default values: n = 10, total-width = 76, min-width = 7
procedure: (dataframe-glimpse df [total-width])
displays: a transposed version of dataframe-display where the column names and types are displayed vertically and the data runs across the page up to total-width, which has a default value of 76.
> (define df
(make-df*
(Boolean #t #f #t)
(Char #\y #\e #\s)
(String "these" "are" "strings")
(Symbol 'these 'are 'symbols)
(Exact 1/2 1/3 1/4)
(Integer 1 -2 3)
(Expt 1e6 -123456 1.2346e-6)
(Dec4 132.1 -157 10.234) ; based on size of numbers
(Dec2 1234 5784 -76833.123)
(Other (cons 1 2) '(a b c) (make-df* (a 2)))))
> (dataframe-display df 3 90)
dim: 3 rows x 10 cols
Boolean Char String Symbol Exact Integer Expt Dec4 Dec2 Other
<bool> <chr> <str> <sym> <num> <num> <num> <num> <num> <other>
#t y these these 1/2 1. 1.000E+6 132.1000 1234.00 <pair>
#f e are are 1/3 -2. -1.235E+5 -157.0000 5784.00 <list>
#t s strings symbols 1/4 3. 1.235E-6 10.2340 -76833.12 <dataframe>
> (dataframe-glimpse df)
dim: 3 rows x 10 cols
Boolean <bool> #t, #f, #t
Char <chr> y, e, s
String <str> these, are, strings
Symbol <sym> these, are, symbols
Exact <num> 1/2, 1/3, 1/4
Integer <num> 1, -2, 3
Expt <num> 1000000.0, -123456, 1.2346e-6
Dec4 <num> 132.1, -157, 10.234
Dec2 <num> 1234, 5784, -76833.123
Other <other> <pair>, <list>, <dataframe>
> (define df2
(make-dataframe
(list
(make-series 'a (iota 25))
(make-series 'b (map add1 (iota 25))))))
> (dataframe-display df2 5)
dim: 15 rows x 2 cols
a b
<num> <num>
0. 1.
1. 2.
2. 3.
3. 4.
4. 5.
> (dataframe-glimpse df2)
dim: 25 rows x 2 cols
a <num> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, ...
b <num> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, ...
Dataframe read/write
procedure: (dataframe-write df path [overwrite])
procedure: (dataframe->csv df path [overwrite])
procedure: (dataframe->tsv df path [overwrite])
writes: a dataframe df as a Scheme object or CSV/TSV file to path; default value for overwrite is #t
procedure: (dataframe-read path)
procedure: (csv->dataframe path [header])
procedure: (tsv->dataframe path [header])
returns: a dataframe from Scheme object or CSV/TSV file at path; for CSV/TSV file, default value for header is #t
> (define df
(make-df*
(Boolean #t #f #t)
(Char #\y #\e #\s)
(String "these" "are" "strings")
(Symbol 'these 'are 'symbols)
(Number 1.1 2 3.2)
(Other (cons 1 2) '(a b c) (make-df* (a 2)))))
> (dataframe-display df)
dim: 3 rows x 6 cols
Boolean Char String Symbol Number Other
<bool> <chr> <str> <sym> <num> <other>
#t y these these 1.1000 <pair>
#f e are are 2.0000 <list>
#t s strings symbols 3.2000 <dataframe>
> (dataframe-write df "df-example.scm")
> (dataframe-display (dataframe-read "df-example.scm"))
;; types are preserved
dim: 3 rows x 6 cols
Boolean Char String Symbol Number Other
<bool> <chr> <str> <sym> <num> <other>
#t y these these 1.1000 <pair>
#f e are are 2.0000 <list>
#t s strings symbols 3.2000 <dataframe>
> (dataframe->csv df "df-example.csv")
> (dataframe-display (csv->dataframe "df-example.csv"))
;; types are not preserved; for `other`, values are not preserved
dim: 3 rows x 6 cols
Boolean Char String Symbol Number Other
<str> <str> <str> <str> <num> <na>
#t y these these 1.1000 na
#f e are are 2.0000 na
#t s strings symbols 3.2000 na
Select, drop, and rename columns
procedure: (dataframe-select df names)
returns: a dataframe of columns with names selected from dataframe df
procedure: (dataframe-select* df name ...)
returns: a dataframe of columns with name(s) selected from dataframe df
> (define df (make-df* (a 1 2 3) (b 4 5 6) (c 7 8 9)))
> (dataframe-display (dataframe-select df '(a)))
dim: 3 rows x 1 cols
a
<num>
1.
2.
3.
> (dataframe-display (dataframe-select* df a))
dim: 3 rows x 1 cols
a
<num>
1.
2.
3.
> (dataframe-display (dataframe-select df '(c b)))
dim: 3 rows x 2 cols
c b
<num> <num>
7. 4.
8. 5.
9. 6.
> (dataframe-display (dataframe-select* df c b))
dim: 3 rows x 2 cols
c b
<num> <num>
7. 4.
8. 5.
9. 6.
procedure: (dataframe-drop df name ...)
returns: a dataframe of columns with names dropped from dataframe df
> (define df (make-df* (a 1 2 3) (b 4 5 6) (c 7 8 9)))
> (dataframe-display (dataframe-drop df '(c b)))
dim: 3 rows x 1 cols
a
<num>
1.
2.
3.
> (dataframe-display (dataframe-drop* df c b))
dim: 3 rows x 1 cols
a
<num>
1.
2.
3.
> (dataframe-display (dataframe-drop df '(a)))
dim: 3 rows x 2 cols
b c
<num> <num>
4. 7.
5. 8.
6. 9.
> (dataframe-display (dataframe-drop* df a))
dim: 3 rows x 2 cols
b c
<num> <num>
4. 7.
5. 8.
6. 9.
procedure: (dataframe-rename df old-names new-names)
returns: a dataframe with a list of column names old-names from dataframe df renamed to new-names
procedure: (dataframe-rename* df (old-name new-name) ...)
returns: a dataframe with column names from dataframe df renamed according to name pairs (old-name new-name)
procedure: (dataframe-rename-all df new-names)
returns: a dataframe with new-names replacing column names from dataframe df
> (define df (make-df* (a 1 2 3) (b 4 5 6) (c 7 8 9)))
> (dataframe-display (dataframe-rename df '(b c) '(Bee Sea)))
dim: 3 rows x 3 cols
a Bee Sea
<num> <num> <num>
1. 4. 7.
2. 5. 8.
3. 6. 9.
> (dataframe-display (dataframe-rename* df (b Bee) (c Sea)))
dim: 3 rows x 3 cols
a Bee Sea
<num> <num> <num>
1. 4. 7.
2. 5. 8.
3. 6. 9.
;; no change made when old name is not found
> (dataframe-display (dataframe-rename* df (d Dee)))
dim: 3 rows x 3 cols
a b c
<num> <num> <num>
1. 4. 7.
2. 5. 8.
3. 6. 9.
> (dataframe-display (dataframe-rename-all df '(A B C)))
dim: 3 rows x 3 cols
A B C
<num> <num> <num>
1. 4. 7.
2. 5. 8.
3. 6. 9.
Filter and sort
procedure: (dataframe-unique df)
returns: a dataframe with only the unique rows of dataframe df
> (define df
(make-df*
(Name "Peter" "Paul" "Mary" "Peter")
(Pet "Rabbit" "Cat" "Dog" "Rabbit")))
> (dataframe-display (dataframe-unique df))
dim: 3 rows x 2 cols
Name Pet
<str> <str>
Peter Rabbit
Paul Cat
Mary Dog
> (define df2
(make-df*
(grp 'a 'a 'b 'b 'b)
(trt 'a 'b 'a 'b 'b)
(adult 1 2 3 4 5)
(juv 10 20 30 40 50)))
> (dataframe-display
(dataframe-unique (dataframe-select* df2 grp trt)))
dim: 4 rows x 2 cols
grp trt
<sym> <sym>
a a
a b
b a
b b
procedure: (dataframe-filter df names procedure)
returns: a dataframe where the rows of dataframe df are filtered based on procedure applied to columns names
procedure: (dataframe-filter* df names expr)
returns: a dataframe where the rows of dataframe df are filtered based on expr applied to columns names
> (define df
(make-df*
(grp 'a 'a 'b 'b 'b)
(trt 'a 'b 'a 'b 'b)
(adult 1 2 3 4 5)
(juv 10 20 30 40 50)))
> (dataframe-display (dataframe-filter df '(adult) (lambda (adult) (> adult 3))))
dim: 2 rows x 4 cols
grp trt adult juv
<sym> <sym> <num> <num>
b b 4. 40.
b b 5. 50.
> (dataframe-display (dataframe-filter* df (adult) (> adult 3)))
dim: 2 rows x 4 cols
grp trt adult juv
<sym> <sym> <num> <num>
b b 4. 40.
b b 5. 50.
> (dataframe-display
(dataframe-filter df '(grp juv) (lambda (grp juv) (and (symbol=? grp 'b) (< juv 50)))))
dim: 2 rows x 4 cols
grp trt adult juv
<sym> <sym> <num> <num>
b a 3. 30.
b b 4. 40.
> (dataframe-display
(dataframe-filter* df (grp juv) (and (symbol=? grp 'b) (< juv 50))))
dim: 2 rows x 4 cols
grp trt adult juv
<sym> <sym> <num> <num>
b a 3. 30.
b b 4. 40.
procedure: (dataframe-filter-at df procedure name ...)
returns: a dataframe where the rows of dataframe df are filtered based on procedure applied to columns names
procedure: (dataframe-filter-all df procedure)
returns: a dataframe where the rows of dataframe df are filtered based on procedure applied to all columns
> (define df
(make-df*
(a 1 'na 3)
(b 'na 5 6)
(c 7 'na 9)))
> (dataframe-display df)
dim: 3 rows x 3 cols
a b c
<num> <num> <num>
1 na 7
na 5 na
3 6 9
> (dataframe-display (dataframe-filter-at df number? 'a 'c))
dim: 2 rows x 3 cols
a b c
<num> <num> <num>
1. na 7.
3. 6 9.
> (dataframe-display (dataframe-filter-all df number?))
dim: 1 rows x 3 cols
a b c
<num> <num> <num>
3. 6. 9.
procedure: (dataframe-partition df names procedure)
returns: two dataframes where the rows of dataframe df are partitioned based on procedure applied to columns names
procedure: (dataframe-partition* df names expr)
returns: two dataframes where the rows of dataframe df are partitioned based on expr applied to columns names
> (define df
(make-df*
(grp 'a 'a 'b 'b 'b)
(trt 'a 'b 'a 'b 'b)
(adult 1 2 3 4 5)
(juv 10 20 30 40 50)))
> (define-values (keep drop)
(dataframe-partition df '(adult) (lambda (adult) (> adult 3))))
> (define-values (keep* drop*)
(dataframe-partition* df (adult) (> adult 3)))
> (dataframe-display keep)
dim: 2 rows x 4 cols
grp trt adult juv
<sym> <sym> <num> <num>
b b 4. 40.
b b 5. 50.
> (dataframe-display drop)
dim: 3 rows x 4 cols
grp trt adult juv
<sym> <sym> <num> <num>
a a 1. 10.
a b 2. 20.
b a 3. 30.
> (dataframe-equal? keep keep*)
#t
> (dataframe-equal? drop drop*)
#t
Sort
procedure: (dataframe-sort df predicates names)
returns: a dataframe where the rows of dataframe df are sorted according a list of predicate procedures acting on a list of column names
procedure: (dataframe-sort* df (predicate name) ...)
returns: a dataframe where the rows of dataframe df are sorted according to the predicate name pairings
> (define df
(make-df*
(grp "a" "a" "b" "b" "b")
(trt "a" "b" "a" "b" "b")
(adult 1 2 3 4 5)
(juv 10 20 30 40 50)))
> (dataframe-display (dataframe-sort df (list string>?) '(trt)))
dim: 5 rows x 4 cols
grp trt adult juv
<str> <str> <num> <num>
a b 2. 20.
b b 4. 40.
b b 5. 50.
a a 1. 10.
b a 3. 30.
> (dataframe-display (dataframe-sort* df (string>? trt)))
dim: 5 rows x 4 cols
grp trt adult juv
<str> <str> <num> <num>
a b 2. 20.
b b 4. 40.
b b 5. 50.
a a 1. 10.
b a 3. 30.
> (dataframe-display (dataframe-sort df (list string>? >) '(trt adult)))
dim: 5 rows x 4 cols
grp trt adult juv
<str> <str> <num> <num>
b b 5. 50.
b b 4. 40.
a b 2. 20.
b a 3. 30.
a a 1. 10.
> (dataframe-display (dataframe-sort* df (string>? trt) (> adult)))
dim: 5 rows x 4 cols
grp trt adult juv
<str> <str> <num> <num>
b b 5. 50.
b b 4. 40.
a b 2. 20.
b a 3. 30.
a a 1. 10.
Split, bind, and append
procedure: (dataframe-split df group-names ...)
returns: list of dataframes split into unique groups by group-names from dataframe df; requires that all values in each grouping column are the same type
> (define df
(make-df*
(grp 'a 'a 'b 'b 'b)
(trt 'a 'b 'a 'b 'b)
(adult 1 2 3 4 5)
(juv 10 20 30 40 50)))
> (for-each dataframe-display (dataframe-split df 'grp))
dim: 2 rows x 4 cols
grp trt adult juv
<sym> <sym> <num> <num>
a a 1. 10.
a b 2. 20.
dim: 3 rows x 4 cols
grp trt adult juv
<sym> <sym> <num> <num>
b a 3. 30.
b b 4. 40.
b b 5. 50.
> (for-each dataframe-display (dataframe-split df 'grp 'trt))
dim: 1 rows x 4 cols
grp trt adult juv
<sym> <sym> <num> <num>
a a 1. 10.
dim: 1 rows x 4 cols
grp trt adult juv
<sym> <sym> <num> <num>
a b 2. 20.
dim: 1 rows x 4 cols
grp trt adult juv
<sym> <sym> <num> <num>
b a 3. 30.
dim: 2 rows x 4 cols
grp trt adult juv
<sym> <sym> <num> <num>
b b 4. 40.
b b 5. 50.
procedure: (dataframe-bind df1 df2 [fill-value])
returns: a dataframe formed by binding all columns of the dataframes df1 and df2 where fill-value is used to fill values for columns that are not common to both dataframes; fill-value defaults to 'na'
procedure: (dataframe-bind-all dfs [fill-value])
returns: a dataframe formed by binding all columns of the list of dataframes dfs
> (define df
(make-df*
(grp 'a 'a 'b 'b 'b)
(trt 'a 'b 'a 'b 'b)
(adult 1 2 3 4 5)
(juv 10 20 30 40 50)))
> (dataframe-display (dataframe-bind-all (dataframe-split df 'grp 'trt)))
dim: 5 rows x 4 cols
grp trt adult juv
<sym> <sym> <num> <num>
a a 1. 10.
a b 2. 20.
b a 3. 30.
b b 4. 40.
b b 5. 50.
> (define df1 (make-df* (a 1 2 3) (b 10 20 30) (c 100 200 300)))
> (define df2 (make-df* (a 4 5 6) (b 40 50 60)))
> (dataframe-display (dataframe-bind df1 df2))
dim: 6 rows x 3 cols
a b c
<num> <num> <num>
1. 10. 100
2. 20. 200
3. 30. 300
4. 40. na
5. 50. na
6. 60. na
> (dataframe-display (dataframe-bind df2 df1))
dim: 6 rows x 3 cols
a b c
<num> <num> <num>
4. 40. na
5. 50. na
6. 60. na
1. 10. 100
2. 20. 200
3. 30. 300
> (dataframe-display (dataframe-bind df1 df2 -999))
dim: 6 rows x 3 cols
a b c
<num> <num> <num>
1. 10. 100.
2. 20. 200.
3. 30. 300.
4. 40. -999.
5. 50. -999.
6. 60. -999.
procedure: (dataframe-append df1 df2 ...)
returns: a dataframe formed by appending columns of the dataframes df1 df2 ...
> (define df1 (make-df* (a 1 2 3) (b 4 5 6)))
> (define df2 (make-df* (c 7 8 9) (d 10 11 12)))
> (dataframe-display (dataframe-append df1 df2))
dim: 3 rows x 4 cols
a b c d
<num> <num> <num> <num>
1. 4. 7. 10.
2. 5. 8. 11.
3. 6. 9. 12.
> (dataframe-display (dataframe-append df2 df1))
dim: 3 rows x 4 cols
c d a b
<num> <num> <num> <num>
7. 10. 1. 4.
8. 11. 2. 5.
9. 12. 3. 6.
Crossing
procedure: (dataframe-crossing obj1 obj2 ...)
returns: a dataframe formed from the cartesian products of obj1, obj2, etc.; objects must be either series or dataframes
> (dataframe-display
(dataframe-crossing
(make-series* (col1 'a 'b))
(make-series* (col2 'c 'd))))
dim: 4 rows x 2 cols
col1 col2
<sym> <sym>
a c
a d
b c
b d
> (dataframe-display
(dataframe-crossing
(make-series* (col1 'a 'b))
(make-df* (col2 'c 'd))))
dim: 4 rows x 2 cols
col1 col2
<sym> <sym>
a c
a d
b c
b d
> (dataframe-display
(dataframe-crossing
(make-df* (col1 'a 'b) (col2 'c 'd))
(make-df* (col3 'e 'f) (col4 'g 'h))))
dim: 4 rows x 4 cols
col1 col2 col3 col4
<sym> <sym> <sym> <sym>
a c e g
a c f h
b d e g
b d f h
Join
procedure: (dataframe-inner-join df1 df2 join-names)
returns: a dataframe formed by joining on the columns, join-names, of the dataframes df1 and df2; retains only rows that match in both dataframes
procedure: (dataframe-left-join df1 df2 join-names [fill-value])
returns: a dataframe formed by joining on the columns, join-names, of the dataframes df1 and df2 where df1 is the left dataframe; rows in df1 not matched by any rows in df2 are filled with fill-value, which defaults to 'na'
procedure: (dataframe-left-join-all dfs join-names [fill-value])
returns: a dataframe formed by joining on the columns, join-names, of the list of dataframes dfs where each data frame is recursively joined to the previous one in the list
> (define df1
(make-df*
(site "b" "a" "c")
(habitat "grassland" "meadow" "woodland")))
> (define df2
(make-df*
(site "c" "b" "c" "b" "d")
(day 1 1 2 2 1)
(catch 10 12 20 24 100)))
> (dataframe-display (dataframe-left-join df1 df2 '(site)))
dim: 5 rows x 4 cols
site habitat day catch
<str> <str> <num> <num>
b grassland 1 12
b grassland 2 24
a meadow na na
c woodland 1 10
c woodland 2 20
> (dataframe-display (dataframe-inner-join df1 df2 '(site)))
dim: 4 rows x 4 cols
site habitat day catch
<str> <str> <num> <num>
b grassland 1. 12.
b grassland 2. 24.
c woodland 1. 10.
c woodland 2. 20.
> (dataframe-display (dataframe-left-join df2 df1 '(site)))
dim: 5 rows x 4 cols
site day catch habitat
<str> <num> <num> <str>
c 1. 10. woodland
c 2. 20. woodland
b 1. 12. grassland
b 2. 24. grassland
d 1. 100. na
> (dataframe-display (dataframe-inner-join df2 df1 '(site)))
dim: 4 rows x 4 cols
site day catch habitat
<str> <num> <num> <str>
c 1. 10. woodland
c 2. 20. woodland
b 1. 12. grassland
b 2. 24. grassland
> (dataframe-display (dataframe-left-join-all (list df2 df1) '(site)))
dim: 5 rows x 4 cols
site day catch habitat
<str> <num> <num> <str>
c 1. 10. woodland
c 2. 20. woodland
b 1. 12. grassland
b 2. 24. grassland
d 1. 100. na
> (define df3
(make-df*
(first "sam" "bob" "sam" "dan")
(last "son" "ert" "jam" "man")
(age 10 20 30 40)))
> (define df4
(make-df*
(first "sam" "bob" "dan" "bob")
(last "son" "ert" "man" "ert")
(game 1 1 1 2)
(goals 0 1 2 3)))
> (dataframe-display (dataframe-left-join df3 df4 '(first last) -999))
dim: 5 rows x 5 cols
first last age game goals
<str> <str> <num> <num> <num>
sam son 10. 1. 0.
bob ert 20. 1. 1.
bob ert 20. 2. 3.
sam jam 30. -999. -999.
dan man 40. 1. 2.
> (dataframe-display (dataframe-inner-join df3 df4 '(first last)))
dim: 4 rows x 5 cols
first last age game goals
<str> <str> <num> <num> <num>
sam son 10. 1. 0.
bob ert 20. 1. 1.
bob ert 20. 2. 3.
dan man 40. 1. 2.
> (dataframe-display (dataframe-left-join df4 df3 '(first last)))
dim: 4 rows x 5 cols
first last game goals age
<str> <str> <num> <num> <num>
sam son 1. 0. 10.
bob ert 1. 1. 20.
bob ert 2. 3. 20.
dan man 1. 2. 40.
Reshape
procedure: (dataframe-stack df names names-to values-to)
returns: a dataframe formed by stacking pieces of a wide-format df; names is a list of column names to be combined into a single column; names-to is the name of the new column formed from the columns selected in names; values-to is the the name of the new column formed from the values in the columns selected in names
> (define df
(make-df*
(day 1 2)
(hour 10 11)
(a 97 78)
(b 84 47)
(c 55 54)))
> (dataframe-display (dataframe-stack df '(a b c) 'site 'count))
dim: 6 rows x 4 cols
day hour site count
<num> <num> <sym> <num>
1. 10. a 97.
2. 11. a 78.
1. 10. b 84.
2. 11. b 47.
1. 10. c 55.
2. 11. c 54.
;; reshaping to long format is useful for aggregating
> (-> (make-df*
(day 1 1 2 2)
(hour 10 11 10 11)
(a 97 78 83 80)
(b 84 47 73 46)
(c 55 54 38 58))
(dataframe-stack '(a b c) 'site 'count)
(dataframe-aggregate*
(hour site)
(total-count (count) (apply + count)))
(dataframe-display))
dim: 6 rows x 3 cols
hour site total-count
<num> <sym> <num>
10. a 180.
11. a 158.
10. b 157.
11. b 93.
10. c 93.
11. c 112.
procedure: (dataframe-spread df names-from values-from [fill-value])
returns: a dataframe formed by spreading a long format dataframe df into a wide-format dataframe; names-from is the name of the column containing the names of the new columns; values-from is the the name of the column containing the values that will be spread across the new columns; fill-value is used to fill combinations that are not found in the long format df and defaults to 'na
> (define df1
(make-df*
(day 1 1 2)
(grp "A" "B" "B")
(val 10 20 30)))
> (dataframe-display (dataframe-spread df1 'grp 'val))
dim: 2 rows x 3 cols
day A B
<num> <num> <num>
1. 10 20.
2. na 30.
> (dataframe-display (dataframe-spread df1 'grp 'val 0))
dim: 2 rows x 3 cols
day A B
<num> <num> <num>
1. 10. 20.
2. 0. 30.
> (define df2
(make-df*
(day 1 1 1 1 2 2 2 2)
(hour 10 10 11 11 10 10 11 11)
(grp 'a 'b 'a 'b 'a 'b 'a 'b)
(val 83 78 80 105 95 77 96 99)))
> (dataframe-display (dataframe-spread df2 'grp 'val))
dim: 4 rows x 4 cols
day hour a b
<num> <num> <num> <num>
1. 10. 83. 78.
1. 11. 80. 105.
2. 10. 95. 77.
2. 11. 96. 99.
Modify and aggregate
procedure: (dataframe-modify df new-names names procedure ...)
returns: a dataframe where the columns names of dataframe df are modified according to the procedure
procedure: (dataframe-modify* df (new-name names expr) ...)
returns: a dataframe where the columns names of dataframe df are modified according to the expr
> (define df
(make-df*
(grp "a" "a" "b" "b" "b")
(trt 'a 'b 'a 'b 'b)
(adult 1 2 3 4 5)
(juv 10 20 30 40 50)))
;; if new name occurs in dataframe, then column is replaced
;; if not, then new column is added
;; expr can refer to columns created in previous expr within the same call to dataframe-modify
;; if names is empty,
;; and procedure or expr is a scalar, then the scalar is repeated to match the number of rows in the dataframe
;; and procedure or expr is a list of length equal to number of rows in dataframe, then the list is used as a column
> (dataframe-display
(dataframe-modify
df
'(grp total prop-juv scalar lst)
'((grp) (adult juv) (juv total) () ())
(lambda (grp) (string-upcase grp))
(lambda (adult juv) (+ adult juv))
(lambda (juv total) (/ juv total))
(lambda () 42)
(lambda () '(2 4 6 8 10))))
dim: 5 rows x 8 cols
grp trt adult juv total prop-juv scalar lst
<str> <sym> <num> <num> <num> <num> <num> <num>
A a 1. 10. 11. 10/11 42. 2.
A b 2. 20. 22. 10/11 42. 4.
B a 3. 30. 33. 10/11 42. 6.
B b 4. 40. 44. 10/11 42. 8.
B b 5. 50. 55. 10/11 42. 10.
> (dataframe-display
(dataframe-modify*
df
(grp (grp) (string-upcase grp))
(total (adult juv) (+ adult juv))
(prop-juv (juv total) (/ juv total))
(scalar () 42)
(lst () '(2 4 6 8 10))))
dim: 5 rows x 8 cols
grp trt adult juv total prop-juv scalar lst
<str> <sym> <num> <num> <num> <num> <num> <num>
A a 1. 10. 11. 10/11 42. 2.
A b 2. 20. 22. 10/11 42. 4.
B a 3. 30. 33. 10/11 42. 6.
B b 4. 40. 44. 10/11 42. 8.
B b 5. 50. 55. 10/11 42. 10.
procedure: (dataframe-modify-at df procedure name ...)
returns: a dataframe where the specified columns names of dataframe df are modified based on procedure, which can only take one argument
procedure: (dataframe-modify-all df procedure)
returns: a dataframe where all columns of dataframe df are modified based on procedure, which can only take one argument
> (define df
(make-df*
(grp 'a 'a 'b 'b 'b)
(trt 'a 'b 'a 'b 'b)
(adult 1 2 3 4 5)
(juv 10 20 30 40 50)))
> (dataframe-display (dataframe-modify-at df symbol->string 'grp 'trt))
dim: 5 rows x 4 cols
grp trt adult juv
<str> <str> <num> <num>
a a 1. 10.
a b 2. 20.
b a 3. 30.
b b 4. 40.
b b 5. 50.
> (define df2
(make-df*
(a 1 2 3)
(b 4 5 6)
(c 7 8 9)))
> (dataframe-display
(dataframe-modify-all df2 (lambda (x) (* x 100))))
dim: 3 rows x 3 cols
a b c
<num> <num> <num>
100. 400. 700.
200. 500. 800.
300. 600. 900.
procedure: (dataframe-aggregate df group-names new-names names procedure ...)
returns: a dataframe where the dataframe df is split according to list of group-names and aggregated according to the procedure applied to columns names
procedure: (dataframe-aggregate* df group-names (new-name names expr) ...)
returns: a dataframe where the dataframe df is split according to list of group-names and aggregated according to the expr applied to columns names
> (define df
(make-df*
(grp 'a 'a 'b 'b 'b)
(trt 'a 'b 'a 'b 'b)
(adult 1 2 3 4 5)
(juv 10 20 30 40 50)))
> (dataframe-display
(dataframe-aggregate
df
'(grp)
'(adult-sum juv-sum)
'((adult) (juv))
(lambda (adult) (sum adult))
(lambda (juv) (sum juv))))
dim: 2 rows x 3 cols
grp adult-sum juv-sum
<sym> <num> <num>
a 3. 30.
b 12. 120.
> (dataframe-display
(dataframe-aggregate*
df
(grp)
(adult-sum (adult) (sum adult))
(juv-sum (juv) (sum juv))))
dim: 2 rows x 3 cols
grp adult-sum juv-sum
<sym> <num> <num>
a 3. 30.
b 12. 120.
> (dataframe-display
(dataframe-aggregate
df
'(grp trt)
'(adult-sum juv-sum)
'((adult) (juv))
(lambda (adult) (sum adult))
(lambda (juv) (sum juv))))
dim: 4 rows x 4 cols
grp trt adult-sum juv-sum
<sym> <sym> <num> <num>
a a 1. 10.
a b 2. 20.
b a 3. 30.
b b 9. 90.
> (dataframe-display
(dataframe-aggregate*
df
(grp trt)
(adult-sum (adult) (sum adult))
(juv-sum (juv) (sum juv))))
dim: 4 rows x 4 cols
grp trt adult-sum juv-sum
<sym> <sym> <num> <num>
a a 1. 10.
a b 2. 20.
b a 3. 30.
b b 9. 90.
Thread first and thread last
procedure: (-> expr ...)
returns: an object derived from passing result of previous expression expr as input to first argument of the next expr
procedure: (->> expr ...)
returns: an object derived from passing result of previous expression expr as input to last argument of the next expr
> (-> '(1 2 3)
(mean)
(+ 10))
12
> (-> (make-df*
(grp 'a 'a 'b 'b 'b)
(trt 'a 'b 'a 'b 'b)
(adult 1 2 3 4 5)
(juv 10 20 30 40 50))
(dataframe-modify*
(total (adult juv) (+ adult juv)))
(dataframe-display))
dim: 5 rows x 5 cols
grp trt adult juv total
<sym> <sym> <num> <num> <num>
a a 1. 10. 11.
a b 2. 20. 22.
b a 3. 30. 33.
b b 4. 40. 44.
b b 5. 50. 55.
> (-> (make-df*
(grp 'a 'a 'b 'b 'b)
(trt 'a 'b 'a 'b 'b)
(adult 1 2 3 4 5)
(juv 10 20 30 40 50))
(dataframe-split 'grp)
(->> (map (lambda (df)
(dataframe-modify*
df
(juv-mean () (mean ($ df 'juv)))))))
(->> (dataframe-bind-all))
(dataframe-filter* (juv juv-mean) (> juv juv-mean))
(dataframe-display))
dim: 2 rows x 5 cols
grp trt adult juv juv-mean
<sym> <sym> <num> <num> <num>
a b 2. 20. 15.
b b 5. 50. 40.
Missing values
procedure: (na? obj)
returns: #t if obj is 'na and #f otherwise
procedure: (any-na? lst)
returns: #t if any elements of lst are 'na and #f otherwise
> (na? 'na)
#t
> (na? "na")
#f
> (na? 'NA)
#f
> (any-na? (iota 10))
#f
> (any-na? (cons 'na (iota 10)))
#t
> (any-na? (cons "na" (iota 10)))
#f
procedure: (remove-na lst)
returns: a list with all 'na elements removed from lst
> (remove-na '(1 na 2 3))
(1 2 3)
> (remove-na '(1 NA 2 3))
(1 NA 2 3)
> (remove-na '(1 "na" 2 3))
(1 "na" 2 3)
procedure: (dataframe-remove-na df [name ...])
returns: a dataframe with any rows containing 'na removed; by default, 'na removed from all columns; optionally, can specify name(s) of columns from which to remove all 'na
> (define df
(make-df*
(a 1 2 3 4 'na)
(b 'na 7 8 9 10)
(c 11 12 'na 14 15)))
> (dataframe-display (dataframe-remove-na df))
dim: 2 rows x 3 cols
a b c
<num> <num> <num>
2. 7. 12.
4. 9. 14.
> (dataframe-display (dataframe-remove-na df 'a 'c))
dim: 3 rows x 3 cols
a b c
<num> <num> <num>
1. na 11.
2. 7 12.
4. 9 14.
Descriptive statistics
procedure: (count obj lst)
returns: number of obj in lst
procedure: (count-elements lst)
returns: list of pairs (element . count) for every unique element in lst
procedure: (rle lst)
returns: list of pairs (element . count) for the run-lenght encoding of lst
procedure: (remove-duplicates lst)
returns: list of unique elements in lst
> (define x '(a b b c c c d d d d na))
> (count 'c x)
3
> (count 'e x)
0
> (count-elements x)
((a . 1) (b . 2) (c . 3) (d . 4) (na . 1))
> (rle x)
((a . 1) (b . 2) (c . 3) (d . 4) (na . 1))
> (rle '(1 1 2 1 1 0 2 2))
((1 . 2) (2 . 1) (1 . 2) (0 . 1) (2 . 2))
> (remove-duplicates x)
(a b c d na)
procedure: (rep lst n type)
returns: list formed by repeating lst n times; type should be either 'times or 'each
> (rep '(1 2) 3 'times)
(1 2 1 2 1 2)
> (rep '(1 2) 3 'each)
(1 1 1 2 2 2)
procedure: (transpose lst)
returns: transposed list of elements in lst
> (transpose '((1 2 3 4) (5 6 7 8)))
((1 5) (2 6) (3 7) (4 8))
> (transpose '((1 5) (2 6) (3 7) (4 8)))
((1 2 3 4) (5 6 7 8))
procedure: (sum lst [na-rm])
returns: the sum of the values in lst; na-rm defaults to #t
> (sum (iota 10))
45
> (apply + (iota 10))
45
> (sum (cons 'na (iota 10)))
45
> (apply + (cons 'na (iota 10)))
Exception in +: na is not a number
> (sum (cons 'na (iota 10)) #f)
na
> (sum '(#t #f #t #f #t))
3
> (length (filter (lambda (x) x) '(#t #f #t #f #t)))
3
> (define df
(make-df*
(b 4 5 6)
(c 7 8 'na)))
> (dataframe-display
(dataframe-modify* df5 (row-sum (a b c) (sum (list a b c)))))
dim: 3 rows x 4 cols
a b c row-sum
<num> <num> <num> <num>
1. 4. 7 12.
2. 5. 8 15.
3. 6. na 9.
procedure: (product lst [na-rm])
returns: the product of the values in lst; na-rm defaults to #t
> (product (map add1 (iota 10)))
3628800
> (apply * (map add1 (iota 10)))
3628800
> (product (cons 'na (map add1 (iota 10))))
> (product (cons 'na (map add1 (iota 10))) #f)
na
> (product '(#t #f #t #f #t))
0
procedure: (mean lst [na-rm])
returns: the arithmetic mean of the values in lst; na-rm defaults to #t
> (mean '(1 2 3 4 5))
3
> (mean '(-10 0 10))
0
> (mean '(-10 0 10 na) #f)
na
> (inexact (mean '(1 2 3 4 5 150)))
27.5
> (mean '(#t #f #t na))
2/3
procedure: (weighted-mean lst weights [na-rm])
returns: the arithmetic mean of the values in lst weighted by the values in weights; na-rm is only applied to lst and defaults to #t; any 'na in weights yields 'na
> (weighted-mean '(1 2 3 4 5) '(5 4 3 2 1))
7/3
> (weighted-mean '(1 2 3 4 na) '(5 4 3 2 1))
15/7
> (weighted-mean '(1 2 3 4 5) '(5 4 3 2 na))
na
> (weighted-mean '(1 2 3 4 5) '(2 2 2 2 2))
3
> (mean '(1 2 3 4 5))
3
> (weighted-mean '(1 2 3 4 5) '(2 0 2 2 2))
13/4
> (mean '(1 3 4 5))
13/4
procedure: (variance lst [na-rm])
returns: the sample variance of the values in lst based on Welford's algorithm; na-rm defaults to #t
> (inexact (variance '(1 10 100 1000)))
233840.25
> (variance '(0 1 2 3 4 5))
7/2
procedure: (standard-deviation lst [na-rm])
returns: the standard deviation of the values in lst; na-rm defaults to #t
> (standard-deviation '(0 1 2 3 4 5))
1.8708286933869707
> (sqrt (variance '(0 1 2 3 4 5)))
1.8708286933869707
procedure: (median lst [type na-rm])
returns: the median of lst corresponding to the given type, which defaults to 8 (see quantile for more info on type); na-rm defaults to #t
> (median '(1 2 3 4 5 6))
3.5
> (quantile '(1 2 3 4 5 6) 0.5)
3.5
procedure: (quantile lst p [type na-rm])
returns: the sample quantile of the values in lst corresponding to the given probability, p, and type; na-rm defaults to #t
The quantile function follows Hyndman and Fan 1996 who recommend type 8, which is the default here. The default in R is type 7.
> (quantile '(1 2 3 4 5 6) 0.5 1)
3
> (quantile '(1 2 3 4 5 6) 0.5 4)
3.0
> (quantile '(1 2 3 4 5 6) 0.5 8)
3.5
> (quantile '(1 2 3 4 5 6) 0.025 7)
1.125
procedure: (interquartile-range lst [type na-rm])
returns: the difference in the 0.25 and 0.75 sample quantiles of the values in lst corresponding to the given type, which defaults to 8 (see quantile for more info on type); na-rm defaults to #t
> (interquartile-range '(1 2 3 5 5))
3.3333333333333335
> (interquartile-range '(1 2 3 5 5) 1)
3
> (interquartile-range '(3 7 4 8 9 7) 9)
4.125
procedure: (cumulative-sum lst)
returns: a list that is the cumulative sum of the values in lst
> (cumulative-sum '(1 2 3 4 5))
(1 3 6 10 15)
> (cumulative-sum '(5 4 3 2 1))
(5 9 12 14 15)
> (cumulative-sum '(1 2 3 na 4))
(1 3 6 na na)