motoko-base icon indicating copy to clipboard operation
motoko-base copied to clipboard

feat: adds substring method for Text with tests

Open krpeacock opened this issue 2 years ago • 6 comments

krpeacock avatar Feb 13 '23 18:02 krpeacock

@crusso Do you think it makes sense to expose str[i] or even str[i..j] as primitives? If the base library implementation ever becomes a bottleneck.

chenyan-dfinity avatar Feb 13 '23 21:02 chenyan-dfinity

Correct index-based access is a linear-time operation on Unicode strings, which immediately makes any looping based on it quadratic. It hence was a very deliberate decision to favour iterators and not to support random access on Text in Motoko, nor any other index-based operations.

We have discussed in the past that a subtext operator(*) should hence take two iterators as boundaries (which also eliminates the out-of-bounds error case), i.e., something like:

extract : (text : Text, start : Iter<Text>, end : Iter<Text>) -> Text

where end is exclusive.

The open problem, as I remember it, was how to ensure that the iterators belong to the text string in question. That will have to be a dynamic check, but that and extracting the position would require piercing the closure representing the iterator. (Maybe a more bearable solution is to introduce a subtype TextIter that has an additional field data : TextIterData, where TextIterData is an opaque primitive type that encapsulates subject text and position; its opaqueness guarantees that the value is well-formed.)

(*) substring wouldn't be a fitting name in Motoko, since the type is called Text.

rossberg avatar Feb 14 '23 09:02 rossberg

I agree that the substring name may not be fitting, but I think that passing Iters instead of Nats is really unintuitive, as a consumer of the API.

Could we call this Text.slice?

krpeacock avatar Feb 16 '23 15:02 krpeacock

There isn't really a useful alternative to an iterators-based API. Because how would you compute an index? The only available way is by iterating over the text -- with an iterator. What would be the point in having to convert that into an index first?

It's perhaps unusual from a JS perspective, but there are other container libraries that work that way as well.

rossberg avatar Feb 17 '23 14:02 rossberg

How do iterators as boundaries work? How do I have to define them to get the boundaries I want?

timohanke avatar Feb 24 '23 08:02 timohanke

How do iterators as boundaries work? How do I have to define them to get the boundaries I want?

Yeah, I'm curious about that too.

In the past, I've want to represent a Text slice as a pair of private and cloneable text iterator, plus length. Iterating the slice clones the iterator and enumerates (up to) length values. But our text iterators don't support cloning at the moment, though I don't think that would be too hard to add.

crusso avatar Mar 01 '23 17:03 crusso