Carp icon indicating copy to clipboard operation
Carp copied to clipboard

String.allocate with NULL as the initial char value causes unexpected behaviors.

Open Liorst4 opened this issue 5 years ago • 12 comments

  • The assert that checks out of bound access in String_string_MINUS_set_BANG_ uses strlen instead of the value given to malloc. This makes the function unusable on empty strings.

  • In the core implementation the length of a string is the strlen of it, and not the initial value given to malloc, this should be documented.

(let [my-string (String.allocate 10 (Char.from-int 0))]
     (do
      (println* (String.length &my-string)) ;; Length is 0, not 10
      (String.string-set! &my-string 1 'a') ;; Assertion fails (Thinks its out of bounds because it checks that the index is less than `strlen`
      ))

Liorst4 avatar Aug 24 '20 14:08 Liorst4

Running this example as-is with latest Carp from the master branch, resulted in:

[PARSE ERROR] "REPL" (line 4, column 43): unexpected ")" expecting "&", "~", "@", "'", "`", "%@", "%", "(", "$[", "[", "{", "-", digit, "0b", "0x", "#\"", "r\"", "\"", "\\" or letter

Changing the code code to (use \a instead of 'a'):

(let [my-string (String.allocate 10 (Char.from-int 0))]
     (do
      (println* (String.length &my-string))
      (String.string-set! &my-string 1 \a)))

Got rid of the parse error, but resulted in:

I did not understand the form `(external String.allocate (Fn [Int Char] String))` at /home/dev/src/leblowl/Carp/core/String.carp:35:22.

Traceback:
  (String.allocate 10 (Char.from-int 0)) at REPL:5:17.
(let [my-string (String.allocate 10 (Char.from-int 0))] (do ...) at REPL:5:1.

So maybe I am running into a separate bug or misusing the language, but this works for demonstrating the index out of bounds:

(def my-string (String.allocate 10 (Char.from-int 0)))
(println* (String.length &my-string))
(String.string-set! &my-string 1 \a)

/home/dev/src/leblowl/Carp/core/carp_string.h:24: bad index: 1 < 0
[RUNTIME ERROR] 'out/Untitled' exited with return value -6.

and:

(String.string-set! &my-string 0 \a)

/home/dev/src/leblowl/Carp/core/carp_string.h:24: bad index: 0 < 0
[RUNTIME ERROR] 'out/Untitled' exited with return value -6.

leblowl avatar Oct 04 '21 02:10 leblowl

The "I did not understand the form" error is a small but (probably) easy bug to fix that's been irritating me for a while.

We're just missing a case in the REPL code that can understand "external" forms, since the case is missing, external functions aren't executable at the top level in the REPL, but they work if you wrap them in a function (as you figured out).

scolsen avatar Oct 04 '21 16:10 scolsen

I looked into this quickly.

This gives the same error:

鲤 (let [s (allocate 10 \x)] s)
I did not understand the form `(external String.allocate (Fn [Int Char] String))` at ...

Traceback:
  (allocate 10 \x) at REPL:15:9.
(let [s (allocate 10 \x)] s) at REPL:15:1.

but this works:

鲤 (allocate 10 \x)
xxxxxxxxxx

So it has something to do with being wrapped inside let..?

eriksvedang avatar Oct 05 '21 14:10 eriksvedang

If we’re hitting a traceback, we’re missing external as a case for HasStaticCall most likely, and the let is evaluated dynamically rather than statically.

hellerve avatar Oct 08 '21 15:10 hellerve

@hellerve Yeah, that's what I thought too but we have a case for it, that's why the second example works above. It somehow stops working when inside a let though.

eriksvedang avatar Oct 08 '21 15:10 eriksvedang

I found the bug, it was related to applications and static code (you’ll see in my PR). I’m now hunting down the occasional hangs you sometimes find when static code gets mixed up in dynamic code. Hopefully I’ll be able to fix it today, otherwise I’ll split up the PR!

hellerve avatar Oct 08 '21 20:10 hellerve

@hellerve Did you find those hangs? Can I go ahead and merge?

eriksvedang avatar Oct 11 '21 19:10 eriksvedang

You can merge, still working on the hangs. They have to do with expandAll and fixpoints, I’m trying to figure out how to fix it!

hellerve avatar Oct 12 '21 14:10 hellerve

@hellerve If you have a repro for that maybe file a separate issue so it doesn't get lost.

eriksvedang avatar Oct 12 '21 19:10 eriksvedang

Is the solution here to use the actual size of the string literal/allocated memory when checking bounds in String_string_MINUS_set_BANG_?

I am not super familiar with C, but it looks like it's not possible to reliably get the size of an allocated block of memory without storing that size somewhere. (https://stackoverflow.com/questions/1518711/how-does-free-know-how-much-to-free, https://stackoverflow.com/questions/492384/how-to-find-the-sizeof-a-pointer-pointing-to-an-array)

Should we allocate extra memory in the string and store size information?

leblowl avatar Oct 28 '21 03:10 leblowl

That'd be the way to do it.

So far we've stuck to Carp strings being exactly C:s char*, which is nice since it becomes immeditely compatible with C api:s, but also has a lot of drawbacks of course. I think it's something we could consider changing, but it's a pretty big change. The main constraint that has to be kept in that case is that we must support non-malloced (binary embedded) strings as the default. That shouldn't be too hard actually (I think when we tried this the first time we lacked a lot of mechanics, like lifetimes, which made it hard to do in a proper way.)

So in short – we should probably switch to our own custom string, but it's a big overhaul. I think it would be better to do it when everything else has settled down a bit.

eriksvedang avatar Oct 29 '21 07:10 eriksvedang

I’d just like to add that the other reason why we haven’t done this is to stay compatible with C APIs (just handing over plain char*s makes using C easier). That’s the reason #119 stalled IIRC.

hellerve avatar Oct 29 '21 13:10 hellerve