mathjs icon indicating copy to clipboard operation
mathjs copied to clipboard

filter with index in the parser does not filter

Open dvd101x opened this issue 1 year ago • 5 comments

Hi,

Just found out this edge case in the parser on v13.1.1

filter([10, 20], f(x, index) = index == 1)     # yields [10, 20]
# expected [10]
filter([10, 20], f(x, index) = index == 1000)  # yields [10, 20]
# expected []

Outside the parser there is no issue.

math.filter([10, 20], (x, index) => index == 0)    // yields [10]
math.filter([10, 20], (x, index) => index == 1000) // yields []

I found it while working on #3256, I'm still trying to find the source of the error.

dvd101x avatar Sep 01 '24 03:09 dvd101x

That is a nice one. The tricky thing is that the index of the map and filter functions of mathjs is not a number but an array with one or multiple numbers, depending on the number of dimensions of the matrix. So the right way to do this is:

filter([10, 20], f(x, index) = index[1] == 1)    # [10] as expected
filter([10, 20], f(x, index) = index[1] == 1000) # []   as expected

The JavaScript code accidentally gives the right results because [0] == 0 and [1] == 1 returns true in JavaScript, whereas anything else like [0] === 1000 returns false. Apparently an array with one value is compared by taking the value from the array and compare that with the right hand side.

josdejong avatar Sep 04 '24 08:09 josdejong

Thank you Jos!

My bad. So I missed that the index is an array of numbers, not a number and the type conversion that happens with == instead of ===. I wasn't aware that [10] == 10 is true in js.

Something curious is that filter only takes 1D arrays, so I think in filter the index is always of the form [i] and never [i, j, ...]. Thus for the filter to work in the parser with an index, the index should always be index[1] and in js when comparing with === it should be index[0].

Returning to the cases I supplied I think it went like this:

filter([10, 20], f(x, index) = index == 1)     # yields [10, 20]
# [1] == 1    yields [true] which is thruthy
# [2] == 1    yields [false] which is thruthy

filter([10, 20], f(x, index) = index == 1000)  # yields [10, 20]
# [1] == 1000    yields [false] which is thruthy
# [2] == 1000    yields [false] which is thruthy

Now I understand it's not an issue of mathjs as the convention is for the index to be an array of numbers because in most contexts it can take n dimensional arrays.

Understanding this is not an issue, I don't know if for convenience, it would be nice for contexts that are 1D arrays (like filter) to accept an index that is a number and internally converting it to an array of one number.

dvd101x avatar Sep 07 '24 16:09 dvd101x

That is a good point, filter only supports 1d arrays so having the index being an array with a single value doesn't make much sense except for consistency with the map method 🤔. Changing this would be a breaking change though, and in general it is good to minimize breaking changes as much as possible. Let's think this though a bit more.

I've added this to the list with breaking changes for v14 as a reminder.

josdejong avatar Sep 11 '24 09:09 josdejong

I gave it some more thought.

What about an automatic conversion, if the index is a number, convert it to an array with only that number. Similar to what js already does where 123 == [123].

Would that be a breaking change?

dvd101x avatar Oct 07 '24 22:10 dvd101x

So then:

  • the index is a number for one dimensional arrays/matrices
  • the index is an array for 2d or multi-dimensional arrays/matrices

It is an interesting idea. I'm not sure if it is a good idea though, it is kind of an inconsistent behavior. This requires the callback to have knowledge about the number of dimensions of the array that it is operating on 🤔. Maybe it would be better to create a different version of map and filter that explicitly only work on 1d arrays, like map1 and filter1 or so? On the other hand, the issue may not be that big that it needs a solution. Using these functions from mathjs may give a WTF the first time you use it, but the API is constent and well documented so you probably figure it out quite fast.

josdejong avatar Oct 09 '24 07:10 josdejong

Hi Jos,

My intention was to make a proposal for filter to automatically convert a value that is not an array to an array containing that value automatically.

I did some attempts to use the original callback but couldn't make it work and then got occupied with other stuff.

Just to let you know it's on the back of my mind but I don't think I will be able to make a proof of concept until December.

dvd101x avatar Nov 07 '24 16:11 dvd101x

Thanks for the update. Before we put effort in a PR or POC I think we should decide on whether we want to adjust this behavior or not. I'll give that some more thought soon.

josdejong avatar Nov 08 '24 13:11 josdejong

Ok, thanks, I'm not sure if it was possible, so I was checking if there was an easy way to do both.

As a summary the idea was for both methods to work.

X = [10, 20, 30];
filter(X, f(x, i) = x + i[1]> 11)  # as index is always an array of one number it should always be i[1]
# Yields [20, 30]
filter(X, f(x, i) = x + i > 11)
# Currently yields [10, 20, 30] but the idea is to yield [20, 30]

In a similar fashion [101] == 101 yields true.

I understand there are more implications regarding the integrity of the language. I'll postpone any POC until we have a clearer view of the desired behavior.

dvd101x avatar Nov 08 '24 15:11 dvd101x

I understand your idea, though I don't have an idea on how to make that work 🤔.

josdejong avatar Nov 08 '24 20:11 josdejong

I've been thinking about pros and cons of changing index of function filter to be a number instead of array containing one number.

Pros:

  1. Simpler to use. There is no use case for index as an array, everyone has to read the first item from the array before being able to do something.
  2. Consistent with JavaScript's filter function

Cons:

  1. It is not consistent with map and forEach. When using filter and map in a chain for example, it is odd that in one case the index is a number and in an other case it is an array.
  2. Changing this behavior is a breaking change. It is best to minimize breaking changes.

I think the downsides are bigger than the upsides, I value consistency and non-breaking changes quite highly. Also, when using the index wrongly, it definitely results in a WTF moment, but I think in most cases it is not that hard to debug the issue. Are you ok with leaving the behavior as is @dvd101x? Or do you see more pros/cons?

I think we should at least improve the documentation of map, filter, and forEach: describe what index is and how to use it. Also, we should improve the type definitions of index for these three functions: that prevents the issue when in a TypeScript environment.

josdejong avatar Nov 15 '24 13:11 josdejong

Hi Jos, thanks for the through consideration and review.

I don't have further comments in favor of this behavior other than what was previously discussed. Specially since I was unable to make it work for both cases I tend to agree with your decision and I have no strong objections about keeping it what it is, since I seldom use filter with an index.

I agree further documentation of index is needed for the callbacks in map, forEach and filter.

As a side note, I noticed there is a function called index that isn't found on functions even though there is documentation for the index function and can be accessed from the see also on subset.

dvd101x avatar Nov 15 '24 21:11 dvd101x

I've improved the documentation and type definitions via https://github.com/josdejong/mathjs/commit/67ddc724e1345fcaf90fc9d4c6a6219fd17d7bf3. This is no breaking change.

josdejong avatar Nov 20 '24 10:11 josdejong

The improvements are published now in v13.2.3

josdejong avatar Nov 20 '24 12:11 josdejong