String.allocate with NULL as the initial char value causes unexpected behaviors.
-
The assert that checks out of bound access in
String_string_MINUS_set_BANG_usesstrleninstead of the value given tomalloc. This makes the function unusable on empty strings. -
In the core implementation the length of a string is the
strlenof it, and not the initial value given tomalloc, this should be documented.
(let [my-string (String.allocate 10 (Char.from-int 0))]
(do
(println* (String.length &my-string)) ;; Length is 0, not 10
(String.string-set! &my-string 1 'a') ;; Assertion fails (Thinks its out of bounds because it checks that the index is less than `strlen`
))
Running this example as-is with latest Carp from the master branch, resulted in:
[PARSE ERROR] "REPL" (line 4, column 43): unexpected ")" expecting "&", "~", "@", "'", "`", "%@", "%", "(", "$[", "[", "{", "-", digit, "0b", "0x", "#\"", "r\"", "\"", "\\" or letter
Changing the code code to (use \a instead of 'a'):
(let [my-string (String.allocate 10 (Char.from-int 0))]
(do
(println* (String.length &my-string))
(String.string-set! &my-string 1 \a)))
Got rid of the parse error, but resulted in:
I did not understand the form `(external String.allocate (Fn [Int Char] String))` at /home/dev/src/leblowl/Carp/core/String.carp:35:22.
Traceback:
(String.allocate 10 (Char.from-int 0)) at REPL:5:17.
(let [my-string (String.allocate 10 (Char.from-int 0))] (do ...) at REPL:5:1.
So maybe I am running into a separate bug or misusing the language, but this works for demonstrating the index out of bounds:
(def my-string (String.allocate 10 (Char.from-int 0)))
(println* (String.length &my-string))
(String.string-set! &my-string 1 \a)
/home/dev/src/leblowl/Carp/core/carp_string.h:24: bad index: 1 < 0
[RUNTIME ERROR] 'out/Untitled' exited with return value -6.
and:
(String.string-set! &my-string 0 \a)
/home/dev/src/leblowl/Carp/core/carp_string.h:24: bad index: 0 < 0
[RUNTIME ERROR] 'out/Untitled' exited with return value -6.
The "I did not understand the form" error is a small but (probably) easy bug to fix that's been irritating me for a while.
We're just missing a case in the REPL code that can understand "external" forms, since the case is missing, external functions aren't executable at the top level in the REPL, but they work if you wrap them in a function (as you figured out).
I looked into this quickly.
This gives the same error:
鲤 (let [s (allocate 10 \x)] s)
I did not understand the form `(external String.allocate (Fn [Int Char] String))` at ...
Traceback:
(allocate 10 \x) at REPL:15:9.
(let [s (allocate 10 \x)] s) at REPL:15:1.
but this works:
鲤 (allocate 10 \x)
xxxxxxxxxx
So it has something to do with being wrapped inside let..?
If we’re hitting a traceback, we’re missing external as a case for HasStaticCall most likely, and the let is evaluated dynamically rather than statically.
@hellerve Yeah, that's what I thought too but we have a case for it, that's why the second example works above. It somehow stops working when inside a let though.
I found the bug, it was related to applications and static code (you’ll see in my PR). I’m now hunting down the occasional hangs you sometimes find when static code gets mixed up in dynamic code. Hopefully I’ll be able to fix it today, otherwise I’ll split up the PR!
@hellerve Did you find those hangs? Can I go ahead and merge?
You can merge, still working on the hangs. They have to do with expandAll and fixpoints, I’m trying to figure out how to fix it!
@hellerve If you have a repro for that maybe file a separate issue so it doesn't get lost.
Is the solution here to use the actual size of the string literal/allocated
memory when checking bounds in String_string_MINUS_set_BANG_?
I am not super familiar with C, but it looks like it's not possible to reliably get the size of an allocated block of memory without storing that size somewhere. (https://stackoverflow.com/questions/1518711/how-does-free-know-how-much-to-free, https://stackoverflow.com/questions/492384/how-to-find-the-sizeof-a-pointer-pointing-to-an-array)
Should we allocate extra memory in the string and store size information?
That'd be the way to do it.
So far we've stuck to Carp strings being exactly C:s char*, which is nice since it becomes immeditely compatible with C api:s, but also has a lot of drawbacks of course. I think it's something we could consider changing, but it's a pretty big change. The main constraint that has to be kept in that case is that we must support non-malloced (binary embedded) strings as the default. That shouldn't be too hard actually (I think when we tried this the first time we lacked a lot of mechanics, like lifetimes, which made it hard to do in a proper way.)
So in short – we should probably switch to our own custom string, but it's a big overhaul. I think it would be better to do it when everything else has settled down a bit.
I’d just like to add that the other reason why we haven’t done this is to stay compatible with C APIs (just handing over plain char*s makes using C easier). That’s the reason #119 stalled IIRC.