cookbook icon indicating copy to clipboard operation
cookbook copied to clipboard

Accidentally quadratic

Open mnieper opened this issue 1 year ago • 5 comments

The proposed solution for string-split uses string-ref and substring. Both are allowed to be O(n) where n is the length of the string. Thus the solution can be accidentally quadratic on some Schemes. It should not be promoted but rewritten.

https://github.com/schemedoc/cookbook/blob/76456bca59da1f25fefb21d8bb11a510d7da4138/recipes/split-string.md?plain=1#L16

mnieper avatar Jul 08 '23 09:07 mnieper

You can add a warning to the file about the performance on some schemes.

jcubic avatar Jul 08 '23 10:07 jcubic

Note that the cookbook permits (and maybe even encourages) multiple solutions with different tradeoffs for the same problem.

For many (perhaps most) practical uses cases, the best approach is the one with the simplest code.

For string split, the only portable alternatives I can think of are:

  • Use string ports
  • First convert to bytevector, then split the bytevector, then convert back to strings. (Noting that splitting on bytes is not the same as splitting on chars.)

Both involve messier code, and the performance on small strings on most implementations may be worse.

lassik avatar Jul 08 '23 11:07 lassik

I guess that the Cookbook is aimed chiefly at beginners or casual users of Scheme. People will copy-and-paste code and not read the fine print (especially when it is a so-called AI copying the code). This is why I opened this issue and why I think it is very important to think twice about the code offered. Scheme has only a few places like this Cookbook, so I think it is important for the Cookbook to get it right.

The best approach is, in fact, string ports, both for scanning and for constructing the strings. The code would not be messier in any way but very straightforward.

If you are worrying about poor performance for small strings then you can either ignore it because the algorithm is fast for small strings in any case or you can add a small dispatch at the top of the procedure so that really small strings are handled differently.

"Simple" code that would not be accidentally quadratic would convert the string to a list and then use list utilities.

mnieper avatar Jul 08 '23 11:07 mnieper

Well, can you write down either or both of those solutions? If the present solution has no advantages compared to them, we can remove it.

lassik avatar Jul 08 '23 11:07 lassik

Something like this:

(library (string-split)
  (export string-split)
  (import (rnrs))

  (define string-split
    (lambda (delim? s)
      (define who 'string-split)
      (unless (procedure? delim?)
        (assertion-violation who "invalid delim? argument" delim?))
      (unless (string? s)
        (assertion-violation who "invalid string argument" s))
      (let ([p (open-string-input-port s)])
        (define peek-char (lambda () (lookahead-char p)))
        (define read-char (lambda () (get-char p)))
        (define skip-delimiter!
          (lambda ()
            (let ([c (peek-char)])
              (when (and (not (eof-object? c))
                         (delim? c))
                (read-char)
                (skip-delimiter!)))))
        (let f ()
          (skip-delimiter!)
          (let ([c (read-char)])
            (if (eof-object? c)
                '()
                (let-values ([(q extract) (open-string-output-port)])
                  (put-char q c)
                  (let g ()
                    (let ([c (read-char)])
                      (cond
                       [(eof-object? c)
                        (list (extract))]
                       [(delim? c)
                        (cons (extract) (f))]
                       [else
                        (put-char q c)
                        (g)])))))))))))

This is R6RS code, but it can be easily translated to R7RS. Note that the original solution using string-ref is less a problem for R6RS because R6RS should implement string-ref in O(1).

mnieper avatar Jul 08 '23 11:07 mnieper