RFI: fancier slicing
Until now slicing is limited to existing positions.
request-for-implementation (e.g. in ./core/Array.carp):
- allow negative indizes counting from the end
- allow omitting the 2nd parameter to indicate "all of it". Since we can not leave it empty (as in Python), one option to indicate this case would be to have a smaller end-index than the start.
Sample implementation (for Strings only, including test cases):
(defn newslice [data start end]
(let [slen (String.length data)
wrap (fn [pos max]
(if (Int.< pos 0) (Int.+ max pos) pos) )
wstart (wrap start slen)
tmpend (wrap end slen)
wend (if (Int.< wstart tmpend) tmpend slen) ]
(String.slice data wstart wend) ))
(defn main []
(let-do [data "0123456789ABC"]
(IO.println &(newslice data 0 -3)) ; all but the last 3 chars
(IO.println &(newslice data 5 0)) ; everything starting at pos 5
(IO.println &(newslice data -3 0)) )) ; the last 3 chars
intended output:
0123456789
56789ABC
ABC
questions:
- Should this replace the old slicing or do we want to keep that (e.g. to abort if negative indizes are passed)?
I had a lot to say about this, so I split this into two sections: Critique and Potential Implementation.
Critique
While I sometimes enjoy this slicing in Python, I’ve found that it often leads to bugs that linger, just because virtually all possible index combinations are valid.
As such, I see the following points:
Pro
- Very expressive.
- One form for a variety of use cases with a compact API.
- Open to extension.
Con
- Leads to what is essentially a DSL with its own pitfalls.
- When variables are used for slicing, the intended behavior is unclear, for instance
(newslice data start end): do I take a prefix, suffix, or possibly just a regular slice?
This would become even more “interesting” if we also adopted Python’s step argument, allowing for things like "hello"[::-2]=="olh" and such. No doubt this is powerful, but it feels supremely unsafe.
I’m not of the opinion that this is a bad feature, but I do wonder whether its value proposition outweighs its potential for bugs.
Potential Implementation
All of these aside, if we do go this way, we could potentially implement a generic slice that relies on nothing but a way to construct an empty collection, take the length, index into it, and push values into a collection. As a sketch:
(defn slice [coll start end]
(let-do [res (zero) ; construct a new collection
l (length coll) ; length of the collection
start (wrap start l)
end (wrap end l)
end (if (Int.< start end) end l)]
(for [i start end]
(push-back! &res ; push values into a collection
(nth coll i) ; index into a collection
))))
Note that this is an O(n) implementation, which is potentially less performant than a specialized implementation, which, in many cases, might reduce to a call to memcpy. It would, however, also allow us to add a step argument, if desired, with ease:
(defn slice [coll start end step]
(let-do [res (zero) ; construct a new collection
l (length coll) ; length of the collection
start (wrap start l)
end (wrap end l)
end (if (Int.< start end) end l)]
(for [i start end step] ; step is taken as is; THIS MIGHT BE UNSAFE
(push-back! &res ; push values into a collection
(nth coll i) ; index into a collection
))))
Now this becomes very similar to how Array.range and for work, which have a similar API and similar pitfalls.
Wow, that's a detailed response. Thank's @hellerve. Here are my 5 cents:
* Leads to what is essentially a DSL with its own pitfalls.
You mean like the (loop) construct in Common Lisp is a language of its own? It's not that bad I hope. And since it's in the standard lib, wouldn't it rather be a "battery included" than much of a problem?
* When variables are used for slicing, the intended behavior is unclear, for instance `(newslice data start end)`: do I take a prefix, suffix, or possibly just a regular slice?
Agreed, it would become a more general tool (though still in quite another league that really 'big guns' like monads) with a somewhat fuzzy interface. And with greater power...
This would become even more “interesting” if we also adopted Python’s step argument
I would suggest to omit the step argument, in order to concentrate on the case which I use on a regular basis.
I would suggest to omit the step argument, in order to concentrate on the case which I use on a regular basis.
Makes sense; though as mentioned Array.range and for support them—would there be a case for uniformity? I’m not sure we need to strive for this in all cases, but I think here that might be useful.