motoko-base
motoko-base copied to clipboard
feat: adds substring method for Text with tests
@crusso Do you think it makes sense to expose str[i]
or even str[i..j]
as primitives? If the base library implementation ever becomes a bottleneck.
Correct index-based access is a linear-time operation on Unicode strings, which immediately makes any looping based on it quadratic. It hence was a very deliberate decision to favour iterators and not to support random access on Text in Motoko, nor any other index-based operations.
We have discussed in the past that a subtext operator(*) should hence take two iterators as boundaries (which also eliminates the out-of-bounds error case), i.e., something like:
extract : (text : Text, start : Iter<Text>, end : Iter<Text>) -> Text
where end
is exclusive.
The open problem, as I remember it, was how to ensure that the iterators belong to the text string in question. That will have to be a dynamic check, but that and extracting the position would require piercing the closure representing the iterator. (Maybe a more bearable solution is to introduce a subtype TextIter
that has an additional field data : TextIterData
, where TextIterData
is an opaque primitive type that encapsulates subject text and position; its opaqueness guarantees that the value is well-formed.)
(*) substring
wouldn't be a fitting name in Motoko, since the type is called Text.
I agree that the substring name may not be fitting, but I think that passing Iters
instead of Nats
is really unintuitive, as a consumer of the API.
Could we call this Text.slice
?
There isn't really a useful alternative to an iterators-based API. Because how would you compute an index? The only available way is by iterating over the text -- with an iterator. What would be the point in having to convert that into an index first?
It's perhaps unusual from a JS perspective, but there are other container libraries that work that way as well.
How do iterators as boundaries work? How do I have to define them to get the boundaries I want?
How do iterators as boundaries work? How do I have to define them to get the boundaries I want?
Yeah, I'm curious about that too.
In the past, I've want to represent a Text slice as a pair of private and cloneable text iterator, plus length. Iterating the slice clones the iterator and enumerates (up to) length values. But our text iterators don't support cloning at the moment, though I don't think that would be too hard to add.