daru icon indicating copy to clipboard operation
daru copied to clipboard

Daru::Index#[] should only be used for index value and Index#at for position.

Open genya0407 opened this issue 8 years ago • 26 comments

Daru::Index#[] behaves as follows:

index = Daru::Index.new([100, 200, 300])

# access with Range:
index[0..1] # => #<Daru::Index(2): {100, 200}>

# access with Integer:
index[1] # => 1

# What I assumed:
index.at(1) # => 200

From the behavior of index[0..1], I assumed that index[] takes position as argument and returns value of index.

In other words, Index#[] sometimes takes position as argument and at other times takes value.

Is there any special reason for this confusing (at least for me) behavior ?

genya0407 avatar Dec 04 '16 10:12 genya0407

I think @lokeshh can answer this best.

v0dro avatar Dec 07 '16 18:12 v0dro

Daru::Index#[] method in Daru takes index value or position as it argument and is supposed to give only position as its output. In case the input is both a valid value and a valid position, argument is assumed to be a value.

The reason for this behavior is this. Daru::Vector#[] also accepts both index value and position as its argument and return a vector value with preference given to index value over position. So, when user specifies an argument to Daru::Vector#[] we simply pass that argument to Daru::Index#[] and it returns us a position and that position is used to find the corresponding vector value.

For example, Vector is stored as Index + Array. So, a vector {:a => 100, :b => 200, :c => 300} will be stored as a index [:a, :b, :c] and an array [100, 200, 300]. When user specifies the index value/position in Vector#[] we pass that on to Index#[] and it returns a position regardless if index value is passed or position and we then use that position to return the corresponding element from the array [100, 200, 300].

So, you can say to support both index value and position in Vector#[] we have allowed Index#[] to accept both a index value and position and return a position.

lokeshh avatar Dec 07 '16 19:12 lokeshh

@v0dro @lokeshh I got it. Thanks!

However, I still think those behavior is confusing. Especially, when I use Integer index, I always have to be careful whether the argument is interpreted as index value or position. This tires my brain.

My suggestion is: #[] method takes only index value, and if we wanna access by position, we use #at method.

How do you think about it?

genya0407 avatar Dec 08 '16 07:12 genya0407

If you're using integer index only why not use only #at?

However, I like this proposition. The #[] method can raise an error if the index value is not found, irrespective of the kind of object that the index value is.

@zverok @gnilrets WDYT? I think @zverok has talked about this before too if I remember correctly...

v0dro avatar Dec 16 '16 14:12 v0dro

We've had several unexpected bugs arise (see the last group_by issue last week) due to #[] serving two purposes. I strongly prefer using #at for position and #[] for index value.

gnilrets avatar Dec 16 '16 15:12 gnilrets

Alright then. Marking this as an enhancement.

v0dro avatar Dec 17 '16 11:12 v0dro

Shall we also do the same for Daru::Vector#[]?

lokeshh avatar Dec 17 '16 13:12 lokeshh

@lokeshh Yeah works for me. I think we should. Will remove confusion once and for all.

v0dro avatar Dec 18 '16 07:12 v0dro

Maybe I should have proposed earlier, but another solution is creating new method that only accepts index value.

In other words, we will have three methods at, of, and []:

  • at only accepts position
  • of only accepts index value
  • [] accepts both.

(There might be more suitable name for of method...)

@v0dro @gnilrets How do you think about this?

genya0407 avatar Dec 23 '16 17:12 genya0407

I think that [] should be predictable. It currently isn't because if an index doesn't exist then it falls back to position and there is no predictable relationship between index and position. I also think it should accept an index since that seems to be the default reference mechanism.

gnilrets avatar Dec 24 '16 21:12 gnilrets

I also don't think we should go for of. [] and at should be sufficient.

I can't think of use cases where a method that accepts both index and position is being used when dedicated methods exist for that purpose. If such a need does arise we can simply build a wrapper over at and [] (not the current versions of these methods, of course).

v0dro avatar Dec 25 '16 20:12 v0dro

I want to take up this issue.

ananyo2012 avatar Mar 09 '17 17:03 ananyo2012

What should be the output for multiple keys like index[100,200] ? In case one of the index is valid and the other one is not what should be the output?

ananyo2012 avatar Mar 09 '17 18:03 ananyo2012

What should be the output for multiple keys like index[100,200] ? In case one of the index is valid and the other one is not what should be the output?

What regular Ruby's arrays return in this case?

zverok avatar Mar 10 '17 12:03 zverok

Regular ruby arrays show a range of values between the comma separated indices, i.e. providing a range or comma separated values give the same output in arrays.

ananyo2012 avatar Mar 10 '17 13:03 ananyo2012

Well, I believe when first index is valid, then everything from this index to the end of Index should be returned.

zverok avatar Mar 10 '17 13:03 zverok

Should it raise an error if the first index is invalid or should it return nil as in regular Ruby arrays?

ananyo2012 avatar Mar 10 '17 13:03 ananyo2012

Just nil

zverok avatar Mar 10 '17 13:03 zverok

What should be the output for multiple keys like index[100,200] ? In case one of the index is valid and the other one is not what should be the output?

I think IndexError should be raised.

Reasons:

  1. Daru::Index#[] raises error when given invalid argument.
index = Daru::Index.new([:a, :b, :c, :d])
index[:x] # => IndexError: Specified index :x does not exist
  1. array[100, 200] and index[100, 200] has different meanings.

array[start, length] means array[start...(start+length)]. On the other hand, index[index1, index2] means Daru::Index.new([index[index1], index[index2]]).

Examples:

array = [:a, :b, :c, :d]
array[1, 3] # => [:b, :c, :d]

index = Daru::Index.new(array)
index[:b, :d] # => #<Daru::Index(2): {b, d}>

Thanks.

reference: https://ruby-doc.org/core-2.2.0/Array.html#method-i-5B-5D

genya0407 avatar Mar 10 '17 15:03 genya0407

Well I think we should return nil for indexes that don't exist (only support indexes not position)and the positions for indexes that exist as that was discussed

index = Daru::Index.new([:a, :b, :c, :d])
index[:a, :b] # => [0, 1]
index[:a, :d] # => [0, nil]
index[:d, :e] # => nil

ananyo2012 avatar Mar 10 '17 20:03 ananyo2012

Also should we deprecate range options in Daru::Index#[] to avoid ambiguity? As it is already available in Daru::Index#at and makes more sense there.

ananyo2012 avatar Mar 10 '17 20:03 ananyo2012

Well I think we should return nil for indexes that don't exist (only support indexes not position)and the positions for indexes that exist as that was discussed

Why Daru::Index#[] should return nil ?

If IndexError is raised, users can notice that they specify wrong index. However, if nil is returned, error will raised from other place. I think this will complicate debug process.

If you have any advantages of returning nil, please tell me it.

Thanks.

genya0407 avatar Mar 11 '17 07:03 genya0407

If you have any advantages of returning nil, please tell me it.

Well, most of the time we a copying Ruby array's/hashes behavior. Just to be consistent. Returning nil instead of errors is OK for Array#[] and Hash#[] -- I believe, that's because when index is taken from variable, you can have data in one case, and just "no data" in another, and it is not an "error".

zverok avatar Mar 11 '17 13:03 zverok

I believe, that's because when index is taken from variable, you can have data in one case, and just "no data" in another, and it is not an "error".

OK. I undestand the advantage, and agree with it.

genya0407 avatar Mar 12 '17 09:03 genya0407

So I am going by these definitions. Daru::Index#[] will return index object if positions are given as argument and return positions if argument is index values and return nil if no values are found. No error is raised. Some examples

index = Daru::Index.new([:a, :b, :c, :d])

index[0..3] # => #<Daru::Index(4): {a, b, c, d}>
index[:a..:c] # => [0, 1, 2]
index[0, 3] # => #<Daru::Index(2): {a, d}>
index[:a, :c] # => [0, 2]
index[0] # => :a
index[:a] # => 0
index[:g] # => nil
index[:e..:f] # => nil

Daru::Index#pos returns positions corresponding to a valid index value. It only accepts valid index value as arguments. If an invalid index or position is given as argument it shows IndexError. Example

index.pos(:a) # => 0
index.pos(:a, :k) # => IndexError: k is not a valid index
index.pos(1) # => IndexError: 1 is not a valid index

Daru::Index#at returns indexes corresponding to valid positions. It only accepts valid positions as arguments and raises error for invalid positions. Example

index.at(0) # => :a
index.at(:a) # => ArgumentError: Unkown position type.
index.at(5) # => IndexError: 5 is not a valid position.

ananyo2012 avatar Mar 27 '17 17:03 ananyo2012

Can this be closed given the PR merged above?

baarkerlounger avatar Aug 05 '17 15:08 baarkerlounger