ndarray icon indicating copy to clipboard operation
ndarray copied to clipboard

Better shape: Deprecate reshape, into_shape

Open bluss opened this issue 2 years ago • 5 comments
trafficstars

into_shape has some slightly questionable default behaviour w.r.t how it handles memory layout. The newer to_shape() fixes those problems! You should use to_shape() if you can.

Into shape doesn't do it /wrong/ but when it has a successful result, it preserves memory layout. C in gives C out. F in gives F out. But if you don't know what memory layout you have going in, into_shape can surprise you.

Into shape tries to be too great method:

  • User doesn't have to specify index ordering, it just guesses
  • It doesn't fall back to copy or clone
    • Useful for: Owned arrays with non-clonable elements
    • Useful for views: We can't fall back to cloning when reshaping a view

Into shape has a niche. Especially the no-copy shape transformation of views is very useful. But the interface is not great about communicating the magic, in some ways.

This PR is a draft because it's not yet decided exactly what the into shape interface should be. If we could, the order argument should always have been present. But we'd like to be backwards compatible as far as possible as we can, there must be many thousands of into shape calls out there.

Here's what the draft does:

  • into_shape is unchanged but deprecated
  • .into_shape_with_order((2, 3)) same as Order::RowMajor below
  • .into_shape_with_order(((2, 3), Order::RowMajor)) reshapes row major arrays, errors on other.
  • .into_shape_with_order(((2, 3), Order::ColMajor)) reshapes col major arrays, errors on other.

bluss avatar Jul 25 '23 22:07 bluss

This PR is a draft because it's not yet decided exactly what the into shape interface should be.

You've made it like to_shape (using ShapeArg) This is coherent and thus more simple for the users, and it's backward compatible. Seems good!

nilgoyette avatar Jul 26 '23 01:07 nilgoyette

to_shape assumes RowMajor when nothing is specified by the caller, while into_shape picks whichever seems to fit with the array that is given (true for into_shape both before this PR and after it in its current shape).

to_shape is great because its only precondition for success (not error) is that the requested shape has the same number of elements. It can handle either ordering request. Simple math for the user :)

into_shape is great because it can reshape views. But it has a precondition that the input array is exactly C-contig or F-contig, unfortunately.

Reshaping by following RowMajor or ColumnMajor index ordering has different effects, it has a different result. This is why it's quite dangerous to do what into_shape does, which is to just pick whatever seems to fit with the given array. The distinction might have been invisible to the user.

It's a documentation and API issue, how to, for into_shape and reshaping in general

  • Continue providing easy to use APIs
  • Make the preconditions clear
  • Find the right level of "Do What I Mean"

I'd like feedback on the tradeoffs involved. Continuing like this, with into_shape and to_shape having subtly different defaults might be ok? But it's kind of a sleeping footgun. More docs and more focus on using to_shape as a primary option (it's already placed more prominently in the docs, but I guess old code and examples use into shape throughout).

bluss avatar Jul 26 '23 09:07 bluss

If I could, I would deprecate and remove into_shape (because it has the "wrong" ordering defaults.). Then add it back with the right defaults.

But I think that causes too much churn in user code, so I'm looking for the second best option.

bluss avatar Jul 30 '23 11:07 bluss

If I could, I would deprecate and remove into_shape (because it has the "wrong" ordering defaults.). Then add it back with the right defaults.

I hesitate to give feedback on subjects that I don't feel confident in, but in this particular case... IMO, you should do exactly what you wrote

  • having a "sleeping footgun" is much worse than deprecating a method.
  • you're not at v1 so a breaking change is not unexpected for the users
  • "now" is the best time to do something that could/should have been done ealier :)

nilgoyette avatar Jul 30 '23 18:07 nilgoyette

For me, into_shape is a very commonly used method so it feels like it would break every program out there. Hence I'm looking for a smoother transition than this.

Even contemplating doing something like

  • into_shape() deprecated
  • into_shape_with_order() new

And after the deprecation period, into_shape_with_order is renamed to into_shape. Maybe the same dance for reshape.

bluss avatar Aug 02 '23 18:08 bluss

@adamreichold if you have time, would be cool if you had a look. I think it's ready to go

bluss avatar Mar 09 '24 13:03 bluss

I plan to but I am not sure when this weekend I will be able to look into it. Are you on a schedule here or would merging it the beginning of next still be fine?

adamreichold avatar Mar 09 '24 13:03 adamreichold

@adamreichold no stress. It's not on any schedule. I've been away for months and so on, would be weird and rude if I asked that anything would suddenly move at a faster pace(!) so I'm not going to do that :slightly_smiling_face: As one of the main maintainers I will continue to sometimes just decide to merge some changes on my own, which is a way I want all the maintainers to work, merge if they think it doesn't require more feedback.

bluss avatar Mar 09 '24 13:03 bluss

Every PR gets an entry in RELEASE.md, but normally, I've been writing that at release time. (Would be great to have a merge conflict free way of doing it in the PR, like the CPython project does.). As you say, this requires a bigger notice.

bluss avatar Mar 10 '24 10:03 bluss

(Would be great to have a merge conflict free way of doing it in the PR, like the CPython project does.)

Over at PyO3, we use Towncrier to semi-automatically produce a changelog from separate "newsfragement" Markdown files. The affinity to Python tooling in general made the choice a bit easier, but I suspect there exist similar tools written in Rust if required.

It does entail quite a bit of setup work up front, but once done writing the changelog entries becomes a breeze (and we even check their presence via the CI via a label to suppress that if the PR does not require an entry).

adamreichold avatar Mar 10 '24 10:03 adamreichold

Thank you. I have learned a bit (doubtful, but maybe) from other maintainers and trying to be more relaxed about the maintainership style. This is good enough, let's go.

With that said, I'll still try my best to say no to code that breaks Rust's rules (relevant for safety), which we can often spot in code reviews, even if it takes some effort..

bluss avatar Mar 10 '24 12:03 bluss