julia
julia copied to clipboard
Add context about `rsplit`'s ordering behavior
Hey folks! We were having a discussion on Slack about the ordering of items with rsplit
. Right now, it is not intuitive that:
Similar to split, but starting from the end of the string.
despite the function starting from the end, the items when returned are all in the same order as using split
.
Any thoughts on how we can update the syntax or the API itself?
The API is pretty much immutable until 2.0. It is unfortunate that a potential future eachrsplit iterator needs to traverse the string twice due to this behaviour.
You can see some of the discussion here: https://stackoverflow.com/questions/73704673/why-does-base-rsplit-not-invert-the-order-compared-to-base-split-of-the-data-i?noredirect=1#comment130151929_73704673
I looked into how other languages did it, since I assumed this would be a very common function.
To my surprise, I found that a lot of languages - Javascript, Java, Perl, PHP, MATLAB, R, Crystal, Go - don't seem to have an equivalent to rsplit
.
Python is the only other language I could find with an rsplit
similar in behaviour to Julia.
The standard library version's doc says:
str.rsplit(sep=None, maxsplit=- 1)
Return a list of the words in the string, using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done, the rightmost ones. If sep is not specified or None, any whitespace string is a separator. Except for splitting from the right, rsplit() behaves like split() which is described in detail below.
The last sentence makes an attempt to clarify the order of returned parts, but could be clearer.
Pandas has its own version of rsplit, whose main docstring isn't super clear on this return order, but its "Examples" section does explicitly say:
Without the n parameter, the outputs of rsplit and split are identical.
The n parameter can be used to limit the number of splits on the delimiter. The outputs of split and rsplit are different.
with examples for each.
Julia's rsplit
docs already have examples showing its return order, but the text description of rsplit
could be more explicit about it (with slight inspiration from the above):
Similar to split, but starting from the end of the string.
The split is performed right-to-left, but the split substrings are returned in the order they appear in the original string i.e. left-to-right. When the
limit
keyword argument is omitted, the outputs of rsplit and split are identical.
(Bonus info about other languages, in case it's useful in the future)
Kotlin and Ruby have something sort of like Julia's rsplit
, but not quite:
- Kotlin has substringAfterLast and substringBeforeLast, which covers what seems to be the most common use of
rsplit
- to split withlimit = 1
. - Ruby has rpartition, which is also a
limit = 1
split, but with the twist of including the delimiter in the result.
Nim has an rsplit
, and so does Rust, but both of these return the split parts in reverse order, as @logankilpatrick originally expected. (So a change to this in Julia 2.0, if we decide we want it, would have precedent in these.)