mir-algorithm icon indicating copy to clipboard operation
mir-algorithm copied to clipboard

bug or feature? sliced.elementCount != arr.length

Open mw66 opened this issue 5 years ago • 13 comments

  // http://mir-algorithm.libmir.org/mir_ndslice_slice.html#sliced
  auto arr = [
      1, 2, 3,
      4, 5, 6];
  auto s = arr.sliced(2, 1);
  Assert.equal(s.elementCount, arr.length);  // failed
  s.prettyArr.writeln;

Expected:6 Actual:2

When user turn an array into a slice, I think most of the time they want turn the whole array into a N-d slice; if this is a feature, I'd think it's a mis-feature, should be better documented (so no surprise to the user), and maybe also issue a run-time warning by default, if the converted sliced.elementCount != arr.length

mw66 avatar Aug 06 '20 17:08 mw66

Use elementCount to get the number of elements in the whole slice. length works the same way that it does for D's dynamic arrays. For instance, the following compiles without error

  auto arr = [
      [1, 2, 3],
      [4, 5, 6]
 ];
  auto s = arr.sliced(2, 1);
  assert(s.length == arr.length);

jmh530 avatar Aug 06 '20 17:08 jmh530

Maybe I didn't make it clear, what I mean is:

"When user turn a 1-d array into a N-d slice, ... "

in my original example, what the intended is turn a 1-d array of 6 elements into a (2, 3) 2-d slice; and if the user mistakenly write (2, 1), the library should generate at least an warning.

Just as reshape does:

http://mir-algorithm.libmir.org/mir_ndslice_topology.html#reshape

Actually, I think function sliced should be merged into function reshape, it will be much more consistent with numpy's interface & behaviour.

As a work-around (i.e. the intended operations):

  auto r = arr.sliced(arr.length).reshape([2, 3], err);
  Assert.equal(err, 0);

In your modified example, the input is a 2-d array already.

mw66 avatar Aug 06 '20 17:08 mw66

I see your point.

Your issue is really that the result of arr.sliced(2, 1) is [[1], [2]] without any error. Right now, the relevant section of the code you would be interested in in mir.ndslice.slice is

    static if (isDynamicArray!Iterator)
    {
        assert(lengthsProduct(_lengths) <= iterator.length,
            "array length should be greater or equal to the product of constructed ndslice lengths");
        auto ptr = iterator.length ? &iterator[0] : null;
        return Slice!(typeof(C.init[0])*, N)(_lengths, ptr);
    }

which will generate an assert error, for instance, if you use (7,1). It seems deliberate that it is a <= instead of ==. There is also an overload for if you leave off the (2, 1), then the result is what you would expect.

Also, you might find that fuse is a little less susceptible to bugs for this use case (or just chaining sliced to reshape, as you suggest).

  auto arr = [
      [1, 2, 3],
      [4, 5, 6]
];
  auto s = arr.fuse;

jmh530 avatar Aug 06 '20 17:08 jmh530

"... It seems deliberate that it is a <= instead of ==. "

If you can ask the lib users to vote on this issue; or add network code to send actual usage stats to you, my gut feeling is that > 99% of the time, user want ==.

Silently perform <= will cause more issues as software evolve, actually that's why I found this bug / feature:

In my program, I convert an 1-d array to 2-d slice for numerical processing; after some code change, there are more elements appended into the 1-d array, but I forget to change the sliced dimension. And the program continue to run silently without complaining anything, but the computation result is wrong: worse, silently wrong, because in numerical computations, you know, if the result is (say 1%) off the correct answer, typically user won't notice it easily.

If this is in numpy, I will be alerted of the mis-match new dimension problem easily.

That's why I'm suggesting: sliced should be merged into function reshape, it will be much more consistent with numpy's interface & behavior.

mw66 avatar Aug 06 '20 18:08 mw66

And for the current sliced behavior (i.e only take partial array as converted N-d array from the head of the input array): the user can always do it explicitly via D's built-in array slice syntax on the 1-d array, e.g. from the original example:

  auto arr = [
      1, 2, 3,
      4, 5, 6];
  auto s = arr[0..2].sliced(2, 1);   // take the head 2 elements
  auto t = arr[2..4].sliced(2, 1);   // take the 2 elements in the middle

The point is that user do it explicitly, and the library enforce the assertion that assert(lengthsProduct(_lengths) == iterator.length).

mw66 avatar Aug 07 '20 16:08 mw66

should be fixed in v3.10

9il avatar Aug 13 '20 14:08 9il

should be fixed in v3.10

Excellent. Thanks.

mw66 avatar Aug 14 '20 02:08 mw66

should be fixed in v3.10

Excellent. Thanks.

It may take a while.

9il avatar Aug 14 '20 02:08 9il

It is a breaking change, but quite small for a release. Needs to wait another one breaking change.

9il avatar Aug 14 '20 02:08 9il

I'm patient :-)

mw66 avatar Aug 14 '20 04:08 mw66

I saw this new user's mistake:

https://forum.dlang.org/post/[email protected]

i.e. sliced -> fuse

Since there will be a new breaking (or deprecation) change release, I'd want to suggest we should deprecate both sliced and fuse, and move all their functionality into reshape.

  1. it will be a single function reshape, that user need to remember
  2. it's more consistent with numpy, i.e more friendly to new users.

mw66 avatar Aug 20 '20 06:08 mw66

should be fixed in v3.10

It's v3.14.14 now :-) or it's done already?

mw66 avatar Aug 13 '22 04:08 mw66

Oh, it isn't. To be honest this change will require to much efforts to update all the codebase we support.

Instead, we can add slicedExactly, that will throw of the array length doesn't equal to the element count.

9il avatar Aug 14 '22 12:08 9il