chibi-scheme Add a feature to cache the most recent string index->cursor result

~~Not ready to merge yet. Needs more testing and benchmarks.~~

This is lighter-weight than building a full index->cursor table for the string, adding a constant two words to the memory required to store a string, as opposed to one word for every n characters. The cached cursor is used for any string-ref operation requesting an index after the most-recently-requested index, making potentially quadratic repeated string-ref procedures run in linear time. In theory, it could also use a heuristic to speed up moving backwards through the string when it thinks that moving the old cursor backwards would be faster than starting again at the start of the string. In practice, my logging of when the cached cursor is actually reused during the Chibi compilation and startup process shows that the most common case of moving backwards is going back to the start of the string anyway. (EDIT: Oh, but there are cases like string-index-right where an efficient implementation would want to move backwards character-by-character from the end of the string.)

I’m unsure how well this interacts with threading in Chibi. As I understand it, Chibi’s green threads will never interrupt a C function, so there should be no races regards getting and setting the cache. If multiple threads are accessing the same string at the same time, they’ll trample on each others’ use of the cursor — but it should still be quicker, in growth terms, for both threads than would be just starting at cursor/index 0 each time.

Benchmarks to follow. It remains to be seen if the cost of checking and setting the cursor cache actually improves performance overall, especially considering the impact on small strings. If results are favourable, I think it might even be reasonable to enable this option by default.

Input, especially code review, appreciated.

Dec 10 '21 20:12 dpk

Preliminary (albeit likely unfairly flattering) benchmarks show that this patch does actually marginally speed up access to even very small strings. With the default SEXP_STRING_INDEX_TABLE_CHUNK_SIZE of 64, the results are most noticeable on strings slightly longer than 76 characters, the minimum to trigger Chibi into generating the offset table at all. With these strings SEXP_USE_STRING_INDEX_TABLE has a notable negative performance impact, whereas the new SEXP_USE_STRING_REF_CACHE is slightly faster (by an insignificant amount, possibly a sampling error). I’ve tested with strings in English (ASCII-only), German (mostly ASCII), Ancient Greek (mostly 2 bytes per character in UTF-8), and Sanskrit (mostly three bytes per character in UTF-8). I’ll post my benchmark code when I’ve tested longer strings.

Results on tiny strings (between 5 and 12 characters long):

vanilla          36.84 real        36.33 user         0.18 sys
string-index     37.72 real        36.52 user         0.15 sys
string-ref-cache 35.29 real        34.93 user         0.12 sys

Results on small strings (all just over 76 characters):

vanilla          105.69 real       104.41 user         0.52 sys
string-index     111.58 real       110.23 user         0.49 sys
string-ref-cache 104.66 real       103.59 user         0.40 sys

Mostly in the weeds so far, though it’s encouraging to see that it doesn’t seem to have any negative time impact. (Practical memory impact is yet to be tested.) As remarked in features.h, the big improvements, if any, will come with very large strings.

Dec 10 '21 22:12 dpk

I'm OK with even a small hit in string-ref performance if it helps some common quadratic patterns. This doesn't slow down the cursor API or any of the high-level APIs built on cursors.

The only real concern is the memory overhead. Chibi assumes 4-word aligned heap objects, so your change increases the current 4-word string wrapper to 8 words (not counting the underlying byte vector).

The other minor concern is this might encourage people to think string-ref is fast for all cases in Chibi and they don't need cursors, when in fact random access would still be slow.

Dec 10 '21 22:12 ashinn

I'm OK with even a small hit in string-ref performance if it helps some common quadratic patterns. This doesn't slow down the cursor API or any of the high-level APIs built on cursors.

Okay, good to know. So far, there doesn’t appear to be a speed performance hit, as you can see above. I’ll have to see what effect changing it to be able to optimize string accesses backwards as well as forwards has.

It just occurred to me that the same cache can be used to speed up sexp_string_cursor_to_index as well as sexp_string_index_to_cursor. I’ll also see about extending my patch to do that. That should also not slow down doing things the pure string-cursor way: the APIs which only use cursors seemingly leave sexp_string_cursor_to_index almost entirely untouched.

The only real concern is the memory overhead. Chibi assumes 4-word aligned heap objects, so your change increases the current 4-word string wrapper to 8 words (not counting the underlying byte vector).

Okay, good to know. How should I measure memory usage? SEXP_USE_DEBUG_GC or external tools or both? If the former, what figures should I look at particularly? The speed tests above are from Mac OS’s /usr/bin/time, which can’t measure memory usage, but I’ve installed GNU time which can.

The other minor concern is this might encourage people to think string-ref is fast for all cases in Chibi and they don't need cursors, when in fact random access would still be slow.

How common is truly random access to strings? I’m doubtful that it’s common enough that this need be a concern. The only pseudo-random access pattern I can think of is the one I mentioned above, of two threads simultaneously trying to process the same string. That could be a problem. Certainly people should still be encouraged to use the string cursor APIs when possible, though.

Dec 11 '21 00:12 dpk

You can check the chibi/red/snow image sizes for a start.

Dec 11 '21 00:12 ashinn

Regarding non-linear usage patterns, backtracking regexp and parsing libraries fall into this case, as well as Boyer-Moore (though that tends to be more localized). Even non-backtracking parsers will tend to record submatch indexes to lazily generate substrings, and the caller may extract the substrings in an arbitrary order.

More pathological cases are a space optimized string set or dictionary which concatenates many strings into a single string, storing only the start offsets externally (e.g. in a compact uvector). The entries may be accessed entirely randomly.

Dec 17 '21 02:12 ashinn

You can check the chibi/red/snow image sizes for a start.

For me, they’re all exactly the same size as they are when built without this patch.

Dec 27 '21 18:12 dpk

I’ve possibly discovered a bug in SEXP_USE_STRING_INDEX_TABLE while writing benchmarks. It seems that sometimes the index table can possibly sometimes contain invalid cursors (pointing to partial UTF-8 bytes) — at least that’s the only explanation I can think of for a program which only uses string-ref (no string-set!, no direct use of cursors) coming out with ERROR in next-char on line 25 of file benchmarks/strings/02-random-char-access-file.scm: string-ref: invalid utf8 byte: {String-Cursor #15 5184}. Will investigate.

Dec 28 '21 20:12 dpk

Still haven’t had time to finish working out some practical benchmarks for this, but coincidentally I did just learn that Emacs uses basically the same trick.

(It doesn’t appear to be officially used in the Emacs source, but the name ‘bookmark’ for this is actually pretty good. Much less awkward than ‘string-ref cache’)

Feb 10 '22 18:02 dpk

Thanks for the link Daphne, that's an interesting article. Obviously I prefer the Julia/Go approach :)

Feb 11 '22 04:02 ashinn

Chicken Scheme is also now planning to switch to doing this.

I’ve merged in a branch which lets the bookmark move backwards as well as forwards through the string. I had intended to benchmark that separately, since moving backwards is slower than moving forwards for a few reasons, but in practice the factors involved don’t seem to be large enough for this to be a concern — there‘s still generally a speedup compared to starting from the beginning of the string.

I’m marking this ready for merge, but I’d appreciate a code review. In particular, I’m not sure if my change to the sexp_struct for strings requires a change in _sexp_type_specs — that code is just completely opaque to me. (Could this be why the image sizes weren’t affected?)

Mar 03 '22 17:03 dpk

Apologies for the bump, but I had someone recently asking when this would be merged. Could you take a look at the remaining issue with _sexp_type_specs?

Sep 26 '22 08:09 dpk

Apologies for the delay, I lost track of this. You don't need to touch _sexp_type_specs in this case. I'll take a final look shortly before merging.

Oct 01 '22 12:10 ashinn

LGTM, sorry for the delay.

Oct 19 '22 08:10 ashinn

chibi-scheme chibi-scheme copied to clipboard

Add a feature to cache the most recent string index->cursor result

chibi-scheme
chibi-scheme copied to clipboard