mojo icon indicating copy to clipboard operation
mojo copied to clipboard

[stdlib] Refactor slice to use Optionals and fix negative step slices

Open bgreni opened this issue 9 months ago • 6 comments

  • Use Optional for Slice start and end
  • Fix negative step slice indexing
  • Fix incorrect String unit tests
  • Fix _StridedRange length with mismatched start and end indices

Fixes #2046 Fixes #1944 Fixes #2142

bgreni avatar May 03 '24 22:05 bgreni

Could these tests using out of bounds values be added in test_string.mojo? There should probably be similar tests in test_list.mojo. The new Slice.adjust() itself has no tests yet.

assert_equal("", str[-50::-1])
assert_equal("Hello Mojo!!", str[-50::]) # fails
assert_equal("!!ojoM olleH", str[:-50:-1]) # fails
assert_equal("Hello Mojo!!", str[:50:])  # fails
assert_equal("H", str[::50])
assert_equal("!", str[::-50])
assert_equal("!", str[50::-50]) # fails
assert_equal("H", str[-50::50]) # fails

mikowals avatar May 05 '24 21:05 mikowals

Could these tests using out of bounds values be added in test_string.mojo? There should probably be similar tests in test_list.mojo. The new Slice.adjust() itself has no tests yet.

assert_equal("", str[-50::-1])
assert_equal("Hello Mojo!!", str[-50::]) # fails
assert_equal("!!ojoM olleH", str[:-50:-1]) # fails
assert_equal("Hello Mojo!!", str[:50:])  # fails
assert_equal("H", str[::50])
assert_equal("!", str[::-50])
assert_equal("!", str[50::-50]) # fails
assert_equal("H", str[-50::50]) # fails

@mikowals I actually wasn't aware Python had this behaviour, thank you kindly for the review! I've made the necessary changes so the previously failing cases now pass.

bgreni avatar May 06 '24 01:05 bgreni

Thanks for taking this on @bgreni! It's awesome to see us using types like Optional to make APIs like this more precise and ergonomic 😄

ConnorGray avatar May 08 '24 15:05 ConnorGray

@bgreni A lot has changed over the past two weeks, could you please rebase this and ensure it passes CI?

laszlokindrat avatar May 20 '24 15:05 laszlokindrat

@bgreni A lot has changed over the past two weeks, could you please rebase this and ensure it passes CI?

Sorry I figured the imported-internally tag meant it was out of my hands. I just had to fix fn __getitem__(self, slice: Slice) for Span since it was recently introduced but otherwise all good!

bgreni avatar May 20 '24 23:05 bgreni

!sync

laszlokindrat avatar May 21 '24 18:05 laszlokindrat

!sync

laszlokindrat avatar May 26 '24 13:05 laszlokindrat

I have been wrestling with CI about this, and discovered that internally we depend on incorrect behavior in multiple places. This is unfortunate, but on the way I also learned that Python's slice has an indices method, which is similar to your _adjust method; at the same time, Python does not have a __len__ for slices, and I think this makes sense given the ambiguity. I will remove the __len__ method in favor of a new unsafe_indices. Could you please open a separate PR just for a new Slice.indices method? Once these both land, we can circle back to the original patch.

laszlokindrat avatar May 26 '24 21:05 laszlokindrat

@bgreni I removed Slice.__len__ so you can rebase and get those changes after the next nightly. Let me know if you can implement Slice.indices.

laszlokindrat avatar May 27 '24 10:05 laszlokindrat

@bgreni I removed Slice.__len__ so you can rebase and get those changes after the next nightly. Let me know if you can implement Slice.indices.

Yeah I'll get on that today thanks!

bgreni avatar May 27 '24 15:05 bgreni

@laszlokindrat Just to clarify is the intent to consolidate the current broken behaviour into a Slice.indices method that returns a Range or something with the adjusted start, end, and step values? Then afterwards we return here and integrate those two changes with the corrected behaviour present in this patch?

bgreni avatar May 27 '24 20:05 bgreni

@laszlokindrat Just to clarify is the intent to consolidate the current broken behaviour into a Slice.indices method that returns a Range or something with the adjusted start, end, and step values? Then afterwards we return here and integrate those two changes with the corrected behaviour present in this patch?

You raise a good point. Since the current slice implementation cannot provide a correct Slice.indices method, you can just do it as part of this PR. Please make sure you mention it in the changelog, though! BTW the python API returns a tuple of three ints IIRC. Thanks again for working on this, this will be a huge improvement!

laszlokindrat avatar May 27 '24 20:05 laszlokindrat

@laszlokindrat In working on that I realized slicing a Span is sort of ill-defined at the moment. Since it can't copy the data, negative and > 1 step values don't work as expected. I think it makes sense for Span to be sliceable though and I have an idea to give Span an internal Slice member that could be used in the indexing behaviour to fix this.

Are you ok with leaving the current behaviour for now and I can fix that after this lands in a following patch?

bgreni avatar May 27 '24 22:05 bgreni

@laszlokindrat I've updated everything to use the indices() semantics and I'll rebase on your changes when nightly time hits.

bgreni avatar May 28 '24 02:05 bgreni

!sync

laszlokindrat avatar May 28 '24 19:05 laszlokindrat

@laszlokindrat Do we have some way to re-invoke the tests when they fail due to current flakey ones?

bgreni avatar May 28 '24 20:05 bgreni

@laszlokindrat Do we have some way to re-invoke the tests when they fail due to current flakey ones?

Not sure, maybe you can add an empty commit? We squash and merge internally, so the commit history on your branch is not important.

laszlokindrat avatar May 29 '24 10:05 laszlokindrat

I attempted to marshal this through again, but it seems our internal uses of slice deeply depend on the current (admittedly incorrect) behavior. My suggestion for a plan forward is the following:

  1. Introduce a new, corrected slice type with a new name, e.g. SliceNew and functions like slice_new.
  2. I will internally migrate uses of Slice that I can to this new type.
  3. Rename the existing Slice to SliceOld and the new SliceNew to Slice (and similarly for slice_old/slice_new).
  4. Mark SliceOld and slice_old deprecated.

WDYT?

laszlokindrat avatar May 30 '24 10:05 laszlokindrat

@laszlokindrat Could be a little confusing for nightly users, but if resolving the internal issues is highly burdensome then that would probably be the best way forward?

bgreni avatar May 30 '24 16:05 bgreni

@laszlokindrat Could be a little confusing for nightly users, but if resolving the internal issues is highly burdensome then that would probably be the best way forward?

+1 to @laszlokindrat's approach. It's the best approach forward for us internally, as we have a good amount of misuse of it across a variety of teams in the codebase. It'd be good for us to do the incremental approach. In general, it becomes increasingly more complex for our small standard library team to fix the whole company's misuse of certain types and features. Instead, we need to build safe alternatives and incrementally migrate them at times — such as the case here.

JoeLoser avatar May 30 '24 16:05 JoeLoser

Right thanks Joe I understand now. Do want to add an overload __getitem__ for the types in this pr that accepts SliceNew so nightly users can get the correct behaviour even if they can't use the shorthand syntax for it yet?

bgreni avatar May 30 '24 17:05 bgreni

Right thanks Joe I understand now. Do want to add an overload __getitem__ for the types in this pr that accepts SliceNew so nightly users can get the correct behaviour even if they can't use the shorthand syntax for it yet?

Maybe, my main ask is that we do this incrementally, since it's clearly gonna be a non-trivial process. So let's just start with a new type + unit tests, then maybe some helpers and phase it in to builtins, and so on. It might also make sense to add implicit conversion between the new and old to help ease the transition.

laszlokindrat avatar May 30 '24 17:05 laszlokindrat

@laszlokindrat I've opened a new PR with a subset of these changes using the name SliceNew #2894. I'll close this one but we can always refer back to it when the time comes for additional changes to be introduced.

bgreni avatar May 30 '24 18:05 bgreni

✅🟣 This contribution has been merged 🟣✅

Your pull request has been merged to the internal upstream Mojo sources. It will be reflected here in the Mojo repository on the nightly branch during the next Mojo nightly release, typically within the next 24-48 hours.

We use Copybara to merge external contributions, click here to learn more.

modularbot avatar Jun 11 '24 02:06 modularbot

@bgreni Some heroics were pulled internally, and we were able to land this patch more or less as is. I will clean up SliceNew and we won't have to phase this in after all. Thanks for being patient with this!

laszlokindrat avatar Jun 11 '24 02:06 laszlokindrat

@laszlokindrat Oh wow how exciting! Thank you for all your work on this!

bgreni avatar Jun 11 '24 04:06 bgreni

Landed in 3cedf4d4fb5d96df1d61ef7907acf3239059c972! Thank you for your contribution 🎉

modularbot avatar Jun 11 '24 05:06 modularbot