rune icon indicating copy to clipboard operation
rune copied to clipboard

string type layout

Open CeleritasCelery opened this issue 2 years ago • 4 comments

There is conflict between the Rust world and the elisp world. Rust expects types to have explicit compile time alias checking, and elisp says you can alias anything you want. We need solution to make both of these worlds happy.

Elisp string are mutable. This might not be such a big deal except for the fact that since not all characters (code points) are the same size, you may need to reallocate the string when mutating it. So we need some way to mutate aliasing strings in Rust.

Take the following code sample:

    let str1 = "foo".into_obj();
    let str2 = str1;
    mutate(str1.untag(), str2.untag());
    
    fn mutate(str1: &LispString, str2: &LispString) -> &str {
        let slice: &str = str1.get(); // take a immutable reference through str1
        str2.set_at(0, 'å'); // mutate the string through str2, requiring a reallocation. This will drop slice
        slice // return the now ivalidated slice
    }

We need to find some way to handle this situation.

1 - current solution: RefCell

The easiest way to handle this from and implementation point of view is using RefCell. This is how thing are currently setup. However this comes with some big downsides. For one, we add overhead to all string access, including immutable access. Second, and probably most important, is that we open up the opportunity for runtime panics. Mutating a string should never be an error (unless it is const).

2 - copy on write

Since the problem is that all references to the string get invalidated on mutation, we could just make a copy instead. So anytime you mutate a string, it keeps the old string buffer valid until the next garbage collection. It would just update the "current" string buffer to point to the new copy.

This has the advantage of being simple implementation wise, but makes the mutation expensive. Probably the only reason you would be using mutation from elisp is to because of performance, now that is gone. This might be okay, because string mutation is a relatively rare operation in elisp.

3 - unsafe

There are only a few function that actually mutate string from elisp:

  • aref
  • store-substring
  • clear-string

Maybe it would be worth it to just mark mutation as unsafe, and require the user to ensure no aliasing happens? There would not be that many unsafe blocks to inspect. This would be fine so long as the mutation subr's are only called from elisp, but if another rust function calls them, all bets are off.

CeleritasCelery avatar Jan 14 '23 19:01 CeleritasCelery

What if we keep two formats of string:

  1. rust strings
  2. vector of chars some operations will demand 1) (inserts) and some 2) (regex). we covert to whatever we need, in the hope that strings we run regex over wont be inserted much and vice versa (is this true?)

Alan-Chen99 avatar May 02 '23 00:05 Alan-Chen99

Emacs Lisp does not allow insertions into strings (thankfully) so that isn't something we have to design for. However it does let you mutate strings by replacing characters. This is almost as bad because not all "characters" are the same size (in UTF8) so it means you may have to grow your backing array to update the string. That being said, these operations are not very common.

by "vector of chars" I assume you mean Vec<u8>? This is what is consumed by the regex engine.

The real problem here is that lisp objects can freely alias, so we don't know how many immutable references exist to the string at any given time.

CeleritasCelery avatar May 02 '23 04:05 CeleritasCelery

By vector of chars I mean you make a vec for a struct char that can be any char. Aliasing just means that we need to "box" the string and to mutate we change the contents of the "box" so the actual string has constant size and consistent location

Alan-Chen99 avatar May 04 '23 03:05 Alan-Chen99

Aliasing just means that we need to "box" the string and to mutate we change the contents of the "box" so the actual string has constant size and consistent location

That is a good idea. similar to copy on write above.

CeleritasCelery avatar May 04 '23 13:05 CeleritasCelery