Carp icon indicating copy to clipboard operation
Carp copied to clipboard

RFI: fancier slicing

Open guberathome opened this issue 4 years ago • 3 comments

Until now slicing is limited to existing positions.

request-for-implementation (e.g. in ./core/Array.carp):

  • allow negative indizes counting from the end
  • allow omitting the 2nd parameter to indicate "all of it". Since we can not leave it empty (as in Python), one option to indicate this case would be to have a smaller end-index than the start.

Sample implementation (for Strings only, including test cases):

(defn newslice [data start end]
  (let [slen   (String.length data)
        wrap   (fn [pos max]
                 (if (Int.< pos 0) (Int.+ max pos) pos) ) 
        wstart (wrap start slen)
        tmpend (wrap end   slen)
        wend   (if (Int.< wstart tmpend) tmpend slen) ]
    (String.slice data wstart wend) ))

(defn main []
  (let-do [data "0123456789ABC"]
    (IO.println &(newslice data  0 -3))    ; all but the last 3 chars
    (IO.println &(newslice data  5 0))     ; everything starting at pos 5
    (IO.println &(newslice data -3 0)) ))  ; the last 3 chars

intended output:

0123456789
56789ABC
ABC

questions:

  1. Should this replace the old slicing or do we want to keep that (e.g. to abort if negative indizes are passed)?

guberathome avatar Jun 18 '21 09:06 guberathome

I had a lot to say about this, so I split this into two sections: Critique and Potential Implementation.

Critique

While I sometimes enjoy this slicing in Python, I’ve found that it often leads to bugs that linger, just because virtually all possible index combinations are valid.

As such, I see the following points:

Pro

  • Very expressive.
  • One form for a variety of use cases with a compact API.
  • Open to extension.

Con

  • Leads to what is essentially a DSL with its own pitfalls.
  • When variables are used for slicing, the intended behavior is unclear, for instance (newslice data start end): do I take a prefix, suffix, or possibly just a regular slice?

This would become even more “interesting” if we also adopted Python’s step argument, allowing for things like "hello"[::-2]=="olh" and such. No doubt this is powerful, but it feels supremely unsafe.

I’m not of the opinion that this is a bad feature, but I do wonder whether its value proposition outweighs its potential for bugs.

Potential Implementation

All of these aside, if we do go this way, we could potentially implement a generic slice that relies on nothing but a way to construct an empty collection, take the length, index into it, and push values into a collection. As a sketch:

(defn slice [coll start end]
  (let-do [res (zero) ; construct a new collection
           l (length coll) ; length of the collection
           start (wrap start l)
           end (wrap end l)
           end (if (Int.< start end) end l)]
      (for [i start end]
        (push-back! &res ; push values into a collection
          (nth coll i) ; index into a collection
         ))))

Note that this is an O(n) implementation, which is potentially less performant than a specialized implementation, which, in many cases, might reduce to a call to memcpy. It would, however, also allow us to add a step argument, if desired, with ease:

(defn slice [coll start end step]
 (let-do [res (zero) ; construct a new collection
          l (length coll) ; length of the collection
          start (wrap start l)
          end (wrap end l)
          end (if (Int.< start end) end l)]
     (for [i start end step] ; step is taken as is; THIS MIGHT BE UNSAFE
       (push-back! &res ; push values into a collection
         (nth coll i) ; index into a collection
        ))))

Now this becomes very similar to how Array.range and for work, which have a similar API and similar pitfalls.

hellerve avatar Jun 18 '21 10:06 hellerve

Wow, that's a detailed response. Thank's @hellerve. Here are my 5 cents:

* Leads to what is essentially a DSL with its own pitfalls.

You mean like the (loop) construct in Common Lisp is a language of its own? It's not that bad I hope. And since it's in the standard lib, wouldn't it rather be a "battery included" than much of a problem?

* When variables are used for slicing, the intended behavior is unclear, for instance `(newslice data start end)`: do I take a prefix, suffix, or possibly just a regular slice?

Agreed, it would become a more general tool (though still in quite another league that really 'big guns' like monads) with a somewhat fuzzy interface. And with greater power...

This would become even more “interesting” if we also adopted Python’s step argument

I would suggest to omit the step argument, in order to concentrate on the case which I use on a regular basis.

guberathome avatar Jun 19 '21 07:06 guberathome

I would suggest to omit the step argument, in order to concentrate on the case which I use on a regular basis.

Makes sense; though as mentioned Array.range and for support them—would there be a case for uniformity? I’m not sure we need to strive for this in all cases, but I think here that might be useful.

hellerve avatar Jun 23 '21 20:06 hellerve