daru
daru copied to clipboard
Incostencies in behaviour of DataFrame.
EDIT (@v0dro): Following is a list of method that should be implemented/corrected to get more consistency:
- [ ] Vector#last.
- [ ] DataFrame#last.
- [ ] Return type of
DataFrame#[]must be consistent when using a timeseries. It currently returns either a numerical value of anotherVectororDataFramedepending on what you pass into#[]. - [ ] Return
nilwhen element not present in theDataFrame(currently raises error).
Ideally these should be split into separate issues and tackled one at a time.
I'd like to use this data to show the situation made me confused:
[25] pry(main)> dates=["2018-03-30", "2018-04-02", "2018-04-27", "2018-05-31", "2018-06-29", "2018-07-31", "2018-08-31", "2018-09-28", "2018-10-31", "2018-11-30"]
=> ["2018-03-30",
"2018-04-02",
"2018-04-27",
"2018-05-31",
"2018-06-29",
"2018-07-31",
"2018-08-31",
"2018-09-28",
"2018-10-31",
"2018-11-30"]
[26] pry(main)> val=[1.00000001, 0.9999, 0.9908, 1.0885, 1.0586, 1.0374, 0.9456, 0.9638, 0.8397, 0.8788]
=> [1.00000001, 0.9999, 0.9908, 1.0885, 1.0586, 1.0374, 0.9456, 0.9638, 0.8397, 0.8788]
[27] pry(main)> id=Daru::DateTimeIndex.new(dates)
=> #<Daru::DateTimeIndex(10) 2018-03-30T00:00:00+00:00...2018-11-30T00:00:00+00:00>
[28] pry(main)> df = Daru::DataFrame.new({val: val}, index: id)
=> #<Daru::DataFrame(10x1)>
val
2018-03-30 1.00000001
2018-04-02 0.9999
2018-04-27 0.9908
2018-05-31 1.0885
2018-06-29 1.0586
2018-07-31 1.0374
2018-08-31 0.9456
2018-09-28 0.9638
2018-10-31 0.8397
2018-11-30 0.8788
- first & last
[29] pry(main)> df.val.first
=> 1.00000001
[30] pry(main)> df.val.last
NoMethodError: undefined method `last' for #<Daru::Vector:0x00007f43dbc591f0>
from /usr/local/lib/ruby/gems/2.4.0/gems/daru-0.2.1/lib/daru/vector.rb:1420:in `method_missing'
# which I supposed it returns 0.8788
- The return type
[31] pry(main)> df.val['2018-03-30','2018-04-30']
=> #<Daru::Vector(3)>
val
2018-03-30T00:00:00+ 1.00000001
2018-04-02T00:00:00+ 0.9999
2018-04-27T00:00:00+ 0.9908
[32] pry(main)> df.val['2018-04']
=> #<Daru::Vector(2)>
val
2018-04-02T00:00:00+ 0.9999
2018-04-27T00:00:00+ 0.9908
[33] pry(main)> df.val['2018-03-30','2018-04-01']
=> 1.00000001
[34] pry(main)> df.val['2018-03']
=> 1.00000001
# which I supposed [33] and [34] both return:
# => #<Daru::Vector(1)>
# val
# 2018-03-30T00:00:00+ 1.00000001
- errors and a not error
[48] pry(main)> df.val['2018']
=> #<Daru::Vector(10)>
val
2018-03-30T00:00:00+ 1.00000001
2018-04-02T00:00:00+ 0.9999
2018-04-27T00:00:00+ 0.9908
2018-05-31T00:00:00+ 1.0885
2018-06-29T00:00:00+ 1.0586
2018-07-31T00:00:00+ 1.0374
2018-08-31T00:00:00+ 0.9456
2018-09-28T00:00:00+ 0.9638
2018-10-31T00:00:00+ 0.8397
2018-11-30T00:00:00+ 0.8788
[49] pry(main)> df.val['2017']
ArgumentError: Key 2017 is out of bounds
from /usr/local/lib/ruby/gems/2.4.0/gems/daru-0.2.1/lib/daru/date_time/index.rb:362:in `[]'
[50] pry(main)> df.val['2019']
ArgumentError: Key 2019 is out of bounds
from /usr/local/lib/ruby/gems/2.4.0/gems/daru-0.2.1/lib/daru/date_time/index.rb:362:in `[]'
[52] pry(main)> df.val['2018-12']
ArgumentError: bad value for range
from /usr/local/lib/ruby/gems/2.4.0/gems/daru-0.2.1/lib/daru/date_time/index.rb:547:in `slice_between_dates'
[53] pry(main)> df.val['2018-02']
=> #<Daru::Vector(10)>
val
2018-03-30T00:00:00+ 1.00000001
2018-04-02T00:00:00+ 0.9999
2018-04-27T00:00:00+ 0.9908
2018-05-31T00:00:00+ 1.0885
2018-06-29T00:00:00+ 1.0586
2018-07-31T00:00:00+ 1.0374
2018-08-31T00:00:00+ 0.9456
2018-09-28T00:00:00+ 0.9638
2018-10-31T00:00:00+ 0.8397
2018-11-30T00:00:00+ 0.8788
# I supposed all those errors and [53] could return #<Daru::Vector(0)> #
I think the gods left room for you to contribute. I'm sorry I'm just kidding.
I'm editing the issue comment to make an itemized list of issue items that can be tackled by a group of volunteers.