cl-str Introduce negative indices

Hi,

as an improvement, would you be interested in making indices accept negative numbers, which apparently is something that other languages (e.g. python) offer?

Based on the existing code in cl-str I propose the semantics of the index to be defined as follows:

(defun index (string pos)
  "Modular/saturating position in string.

Returns an index between 0 and (LENGTH STRING), inclusive.

If POS is NIL, the returned value is the length of STRING,
to be consistent with exclusive end indices.

Otherwise POS must be an integer.

If POS is positive, it is clamped to be at most the length
of the string.

If POS is negative and its magnitude is lower than the
length of string, the returned index is meant to represent
an offset from the end.

Otherwise, if POS is negative and its magnitude is greater
than the length, the returned index is 0.

EXAMPLES:

   (flet ((is (pos res) (assert (= (index \"abcde\" pos) res))))
     ;; nil case
     (is nil  5)

     ;; positive
     (is 3  3)
     (is 4  4)
     (is 5  5)
     (is 10 5)

     ;; negative
     (is -1   4)
     (is -2   3)
     (is -3   2)
     (is -4   1)
     (is -5   0)
     (is -10  0))
  "
  (check-type pos (or null integer))
  (with-accessors ((length length)) string
    (cond
      ((not pos) length)
      ((>= pos 0) (min pos length))
      ((< (abs pos) length) (+ length pos))
      (t 0))))

If you are interested, I can add a patch that process indices in the existing functions through index to support negative indices, something that should be backward compatible AFAIK.

ok, this is not backward compatible considering substring clamps the index, so if some code relies on negative indices being clamped to zero, they'll have regressions.

Sep 29 '20 09:09 christophejunke

On general, I like this feature.

About substring: yes, we should not break it. It seems acceptable to create another function with another behavior.

We can add index, and were you thinking about other functions? split and count-substring have a start argument. They currently don't clamp it, but the benefit is less clear, if not counter-intuitive.

Sep 29 '20 16:09 vindarel

The name for substring I had in mind would be slice.

I was thinking more generally about all the places that accept an index. It changes not what functions do, but how positions are written. For example insert in last position, while not being the most idiomatic way to do it, would be (insert #\x -1 s). Same for split, e.g. you just read N characters from a stream and want to extract tokens starting at -N.

I understand this breaks functions for negative indices, on the other hand I don't know how much the API is fixed at this point or if you want to introduce new symbols instead (e.g. in alexandria no more new symbols are allowed because it is used a lot).

Basically, I may be wrong but it seems to me that the library has still some capacity to change for those corner cases (provided the feature is wanted), so it might be worth considering having small breaking changes.

It goes without saying that I am ok with whatever decisions you'll end up making about this.

Sep 29 '20 17:09 christophejunke

Also, this could be subject to another configuration variable: (1) this would avoid having breaking changes, and (2) the suggested index computation above is not necessarily what most people expect (maybe we want to have errors instead of saturating the index). A special *index-mode* variable could help customize the behaviour on out-of-range indices (at the cost of a bit more complexity).

Sep 29 '20 18:09 christophejunke

substring -> slice

+1 for slice.

insert

looks doable. With a warning first, and maybe with a key argument to control the behavior? If someone speaks up we can be more conservative.

split

The current start and end arguments are untouched.

OK it can be a nice feature to have.

Let's print a warning in case of negative indices for 3/4 Quicklisp releases then.

New symbols

Yes we can introduce new symbols. We can, and I don't think this little is used, I expect everybody to use the str: prefix.

By the way, we could make that a rule and then re-use reserved symbols from the :cl package (search,…).

I am not seduced by the *index-mode* possibility. We can rather create new functions (or another package).

Sep 30 '20 18:09 vindarel