daru
daru copied to clipboard
Daru::Index#[] should only be used for index value and Index#at for position.
Daru::Index#[] behaves as follows:
index = Daru::Index.new([100, 200, 300])
# access with Range:
index[0..1] # => #<Daru::Index(2): {100, 200}>
# access with Integer:
index[1] # => 1
# What I assumed:
index.at(1) # => 200
From the behavior of index[0..1], I assumed that index[] takes position as argument and returns value of index.
In other words, Index#[] sometimes takes position as argument and at other times takes value.
Is there any special reason for this confusing (at least for me) behavior ?
I think @lokeshh can answer this best.
Daru::Index#[] method in Daru takes index value or position as it argument and is supposed to give only position as its output. In case the input is both a valid value and a valid position, argument is assumed to be a value.
The reason for this behavior is this. Daru::Vector#[] also accepts both index value and position as its argument and return a vector value with preference given to index value over position. So, when user specifies an argument to Daru::Vector#[] we simply pass that argument to Daru::Index#[] and it returns us a position and that position is used to find the corresponding vector value.
For example, Vector is stored as Index + Array. So, a vector {:a => 100, :b => 200, :c => 300} will be stored as a index [:a, :b, :c] and an array [100, 200, 300]. When user specifies the index value/position in Vector#[] we pass that on to Index#[] and it returns a position regardless if index value is passed or position and we then use that position to return the corresponding element from the array [100, 200, 300].
So, you can say to support both index value and position in Vector#[] we have allowed Index#[] to accept both a index value and position and return a position.
@v0dro @lokeshh I got it. Thanks!
However, I still think those behavior is confusing. Especially, when I use Integer index, I always have to be careful whether the argument is interpreted as index value or position. This tires my brain.
My suggestion is: #[] method takes only index value, and if we wanna access by position, we use #at method.
How do you think about it?
If you're using integer index only why not use only #at?
However, I like this proposition. The #[] method can raise an error if the index value is not found, irrespective of the kind of object that the index value is.
@zverok @gnilrets WDYT? I think @zverok has talked about this before too if I remember correctly...
We've had several unexpected bugs arise (see the last group_by issue last week) due to #[] serving two purposes. I strongly prefer using #at for position and #[] for index value.
Alright then. Marking this as an enhancement.
Shall we also do the same for Daru::Vector#[]?
@lokeshh Yeah works for me. I think we should. Will remove confusion once and for all.
Maybe I should have proposed earlier, but another solution is creating new method that only accepts index value.
In other words, we will have three methods at, of, and []:
atonly accepts positionofonly accepts index value[]accepts both.
(There might be more suitable name for of method...)
@v0dro @gnilrets How do you think about this?
I think that [] should be predictable. It currently isn't because if an index doesn't exist then it falls back to position and there is no predictable relationship between index and position. I also think it should accept an index since that seems to be the default reference mechanism.
I also don't think we should go for of. [] and at should be sufficient.
I can't think of use cases where a method that accepts both index and position is being used when dedicated methods exist for that purpose. If such a need does arise we can simply build a wrapper over at and [] (not the current versions of these methods, of course).
I want to take up this issue.
What should be the output for multiple keys like index[100,200] ? In case one of the index is valid and the other one is not what should be the output?
What should be the output for multiple keys like
index[100,200]? In case one of the index is valid and the other one is not what should be the output?
What regular Ruby's arrays return in this case?
Regular ruby arrays show a range of values between the comma separated indices, i.e. providing a range or comma separated values give the same output in arrays.
Well, I believe when first index is valid, then everything from this index to the end of Index should be returned.
Should it raise an error if the first index is invalid or should it return nil as in regular Ruby arrays?
Just nil
What should be the output for multiple keys like index[100,200] ? In case one of the index is valid and the other one is not what should be the output?
I think IndexError should be raised.
Reasons:
Daru::Index#[]raises error when given invalid argument.
index = Daru::Index.new([:a, :b, :c, :d])
index[:x] # => IndexError: Specified index :x does not exist
array[100, 200]andindex[100, 200]has different meanings.
array[start, length] means array[start...(start+length)].
On the other hand, index[index1, index2] means Daru::Index.new([index[index1], index[index2]]).
Examples:
array = [:a, :b, :c, :d]
array[1, 3] # => [:b, :c, :d]
index = Daru::Index.new(array)
index[:b, :d] # => #<Daru::Index(2): {b, d}>
Thanks.
reference: https://ruby-doc.org/core-2.2.0/Array.html#method-i-5B-5D
Well I think we should return nil for indexes that don't exist (only support indexes not position)and the positions for indexes that exist as that was discussed
index = Daru::Index.new([:a, :b, :c, :d])
index[:a, :b] # => [0, 1]
index[:a, :d] # => [0, nil]
index[:d, :e] # => nil
Also should we deprecate range options in Daru::Index#[] to avoid ambiguity? As it is already available in Daru::Index#at and makes more sense there.
Well I think we should return nil for indexes that don't exist (only support indexes not position)and the positions for indexes that exist as that was discussed
Why Daru::Index#[] should return nil ?
If IndexError is raised, users can notice that they specify wrong index.
However, if nil is returned, error will raised from other place.
I think this will complicate debug process.
If you have any advantages of returning nil, please tell me it.
Thanks.
If you have any advantages of returning nil, please tell me it.
Well, most of the time we a copying Ruby array's/hashes behavior. Just to be consistent. Returning nil instead of errors is OK for Array#[] and Hash#[] -- I believe, that's because when index is taken from variable, you can have data in one case, and just "no data" in another, and it is not an "error".
I believe, that's because when index is taken from variable, you can have data in one case, and just "no data" in another, and it is not an "error".
OK. I undestand the advantage, and agree with it.
So I am going by these definitions. Daru::Index#[] will return index object if positions are given as argument and return positions if argument is index values and return nil if no values are found. No error is raised. Some examples
index = Daru::Index.new([:a, :b, :c, :d])
index[0..3] # => #<Daru::Index(4): {a, b, c, d}>
index[:a..:c] # => [0, 1, 2]
index[0, 3] # => #<Daru::Index(2): {a, d}>
index[:a, :c] # => [0, 2]
index[0] # => :a
index[:a] # => 0
index[:g] # => nil
index[:e..:f] # => nil
Daru::Index#pos returns positions corresponding to a valid index value. It only accepts valid index value as arguments. If an invalid index or position is given as argument it shows IndexError. Example
index.pos(:a) # => 0
index.pos(:a, :k) # => IndexError: k is not a valid index
index.pos(1) # => IndexError: 1 is not a valid index
Daru::Index#at returns indexes corresponding to valid positions. It only accepts valid positions as arguments and raises error for invalid positions. Example
index.at(0) # => :a
index.at(:a) # => ArgumentError: Unkown position type.
index.at(5) # => IndexError: 5 is not a valid position.
Can this be closed given the PR merged above?