cookbook
cookbook copied to clipboard
Accidentally quadratic
The proposed solution for string-split
uses string-ref
and substring
. Both are allowed to be O(n) where n is the length of the string. Thus the solution can be accidentally quadratic on some Schemes. It should not be promoted but rewritten.
https://github.com/schemedoc/cookbook/blob/76456bca59da1f25fefb21d8bb11a510d7da4138/recipes/split-string.md?plain=1#L16
You can add a warning to the file about the performance on some schemes.
Note that the cookbook permits (and maybe even encourages) multiple solutions with different tradeoffs for the same problem.
For many (perhaps most) practical uses cases, the best approach is the one with the simplest code.
For string split, the only portable alternatives I can think of are:
- Use string ports
- First convert to bytevector, then split the bytevector, then convert back to strings. (Noting that splitting on bytes is not the same as splitting on chars.)
Both involve messier code, and the performance on small strings on most implementations may be worse.
I guess that the Cookbook is aimed chiefly at beginners or casual users of Scheme. People will copy-and-paste code and not read the fine print (especially when it is a so-called AI copying the code). This is why I opened this issue and why I think it is very important to think twice about the code offered. Scheme has only a few places like this Cookbook, so I think it is important for the Cookbook to get it right.
The best approach is, in fact, string ports, both for scanning and for constructing the strings. The code would not be messier in any way but very straightforward.
If you are worrying about poor performance for small strings then you can either ignore it because the algorithm is fast for small strings in any case or you can add a small dispatch at the top of the procedure so that really small strings are handled differently.
"Simple" code that would not be accidentally quadratic would convert the string to a list and then use list utilities.
Well, can you write down either or both of those solutions? If the present solution has no advantages compared to them, we can remove it.
Something like this:
(library (string-split)
(export string-split)
(import (rnrs))
(define string-split
(lambda (delim? s)
(define who 'string-split)
(unless (procedure? delim?)
(assertion-violation who "invalid delim? argument" delim?))
(unless (string? s)
(assertion-violation who "invalid string argument" s))
(let ([p (open-string-input-port s)])
(define peek-char (lambda () (lookahead-char p)))
(define read-char (lambda () (get-char p)))
(define skip-delimiter!
(lambda ()
(let ([c (peek-char)])
(when (and (not (eof-object? c))
(delim? c))
(read-char)
(skip-delimiter!)))))
(let f ()
(skip-delimiter!)
(let ([c (read-char)])
(if (eof-object? c)
'()
(let-values ([(q extract) (open-string-output-port)])
(put-char q c)
(let g ()
(let ([c (read-char)])
(cond
[(eof-object? c)
(list (extract))]
[(delim? c)
(cons (extract) (f))]
[else
(put-char q c)
(g)])))))))))))
This is R6RS code, but it can be easily translated to R7RS. Note that the original solution using string-ref
is less a problem for R6RS because R6RS should implement string-ref
in O(1).