corrode icon indicating copy to clipboard operation
corrode copied to clipboard

Arrays are broken!

Open lamarqua opened this issue 8 years ago • 12 comments

It seems like arrays are decayed into mere pointers. This causes all sorts of problems. Let's start a very simple test example (one that does not require stdio):

int main() { int array[1]; array[0] = 42; return array[0]; }

[adrien@MacBookPro ~/dev/repos/corrode/tests]$ gcc -Wall array.c -o array
[adrien@MacBookPro ~/dev/repos/corrode/tests]$ ./array; echo $?
42

So far so good. But Corrode translates that to:

fn main() {
    let ret = unsafe { _c_main() };
    std::process::exit(ret);
}

#[no_mangle]
pub unsafe fn _c_main() -> i32 {
    let mut array : *mut i32;
    {
        let _rhs = 42i32;
        let _lhs = &mut *array.offset(0i32 as (isize));
        *_lhs = _rhs;
    }
    *array.offset(0i32 as (isize))
}

Calling rustc on the generated code:

[adrien@MacBookPro ~/dev/repos/corrode/tests]$ rustc array.rs
array.rs:11:26: 11:31 error: use of possibly uninitialized variable: `array` [E0381]
array.rs:11         let _lhs = &mut *array.offset(0i32 as (isize));
                                     ^~~~~
array.rs:11:26: 11:31 help: run `rustc --explain E0381` to see a detailed explanation
array.rs:14:6: 14:11 error: use of possibly uninitialized variable: `array` [E0381]
array.rs:14     *array.offset(0i32 as (isize))
                 ^~~~~
array.rs:14:6: 14:11 help: run `rustc --explain E0381` to see a detailed explanation
error: aborting due to 2 previous errors

rustc legitimately complains that we're not initializing the arrray, and he's right! We're never allocating memory anywhere.

Rust has an array type, which I think is appropriate to handle these kind of cases. I feel like the semantics between C arrays and Rust arrays are similar, but I haven't dived deep into it.

There are a couple interesting things to note about arrays, and how we can handle them in Corrode.

First, through some mechanism unknown to me, they are decayed into pointers. I feel like this has to do with baseTypeOf and derivedTypeOf (which seem to ignore the relevant information from CDerivedDeclarator), but I can't say I understand this part of the code quite yet. My understanding is that there isn't an equivalent Rust.Type either, which would contain the relevant information for the translation.

Arrays have a whole bunch of interesting properties:

  • their size can be omitted, making the type of the declaration incomplete, but a subsequent declaration can fix this and make the type complete: 6.2.5.22

22 An array type of unknown size is an incomplete type. It is completed, for an identifier of that type, by specifying the size in a later declaration (with internal or external linkage).

This is interesting because I am not sure how the language-c package would handle cases like this:

int global_array[];
// [...]
int global_array[25];

When we access the relevant information parsed by language-c, we are definitely more interested in knowing that global_array has 25 elements.

  • Ever since C99, arrays can have variable length - they are still allocated on the stack though. But this feature was retrograded to optional in C11 from mandatory in C99, and it is generally frowned upon by a subset of C developers. Implementing this in Rust while preserving the semantics may be hard! (the alloca POSIX function [and its Windows] equivalent) may be used instead?
  • Arrays are decayed into pointers when they are passed as an argument to a function. But also, TIL, you can have a static specifier within the brackets to hint the compiler that the array (while still being decayed) must contain at least N elements! This is amazing and I didn't know that despite having worked with C for a very long time. Having quickly tested it, it seems like language-c accepts it (but I don't know whether it holds the info somewhere, and whether this info would be useful at all in translating to Rust - my gut feeling is probably not).
  • Another interesting thing is the interaction between sizeof and arrays. The interesting part to know is that if we declare an array, say:

int array[25];

Using the symbol array anywhere in C code is gonna be completely equivalent to &array[0], except in the case where we take the sizeof of it! In the case of VLA, we want to return the expression used to compute the number of elements in the array times the size of an element in the array. In the case of a non-VLA we simply return the size known at compile time (#elements x sizeof(element)). It is a very common idiom to have a macro defined as such:

#define ARRAY_COUNT(x) (sizeof(x) / (sizeof((x)[0])

You can then use this to get, for example, the size of a compile-time declared string (provided it's declared as an incomplete array):

const char s[] = "Corrode rules!";
size_t len_s = ARRAY_COUNT(s);

I discovered a few other broken things related to sizeof and arrays, but I'll talk about it in detail in the sizeof issue :).

lamarqua avatar Jul 28 '16 04:07 lamarqua

Yes, I implemented a cheesy hack instead of properly handling array types. The current hack gets arrays translated correctly in function arguments, but gets them wrong everywhere else.

Thank you for the detailed and insightful comments on how to do it properly. I didn't know about that use of the static keyword either, for example!

I'd add that incomplete array types may also be completed by having an initializer. Your last example shows that:

const char s[] = "Corrode rules!";

But here are a couple more examples:

int a[] = { 1, 2, 3 };
int b[] = { [10] = 1 };

a has three elements and b has... um... 11? I dunno, fencepost errors are hard.

Fully fixing array handling probably needs to wait for #38 to get merged so we have a solid foundation for handling initializers. But you could certainly fix most of the current bugs just by getting the internal type representation right. I think that means:

  • Add an IsArray constructor alongside IsPtr;
  • Check every current use of IsPtr and see if it needs to handle IsArray as well (maybe write a helper that converts IsArray to IsPtr and use that any place where the two should be equivalent?);
  • Fix the derive (CArrDeclr ...) case to produce IsArray instead of IsPtr;
  • Have the derive (CFunDeclr ...) case convert IsArray back to IsPtr for function arguments... I think?

It might be worth tackling #18 before doing this so it's more clear which bits of syntax allow pointers, which would help in deciding which ones should also be able to handle arrays.

Oh, and C99 says somewhere that functions can't return array types (citation needed). That's easy to check for in the derive (CFunDeclr ...) case so we might as well add a call to badSource if that comes up.

jameysharp avatar Jul 28 '16 07:07 jameysharp

Thanks for the detailed implementation plan!

A few more thoughts and remarks in general:

  • VLAs may prove difficult to implement in general in a "pure" semantics preserving way as it seems that Rust doesn't allow dynamically sized types on the stack ; but allocating them on the heap with Vectors should be totally doable on the other hand.
  • Initializers for arrays are unsupported! At least, for arrays of length >= 2.
void f() {
    int array[2] = { 1, 2 };
}
("array2.c": line 25): illegal scalar initializer; check whether a real C compiler accepts this:
    { 1, 2 }

Is the Corrode code that handles the same as the one referred to in issue #38 ?

  • As for

Oh, and C99 says somewhere that functions can't return array types (citation needed).

The relevant part is 6.7.5.3, §1

A function declarator shall not specify a return type that is a function type or an array type

So we probably need to check for that as well.

I think I may have spotted a couple other issues, but can't remember them on top of my head nor I can find where I put my notes...

lamarqua avatar Jul 29 '16 15:07 lamarqua

I'd like to get patches adding support for arrays that is correct except for not having VLA support at first. Variable length arrays have enough challenging quirks that I think it's worth deferring them in order to focus on handling the basics. That said, translating them as Vec someday seems OK, probably? I haven't thought about that very hard yet.

Yes, pull request #38 should be fixing most of the initializer bugs soon. But the error you're seeing isn't because initializing single-element arrays works (it doesn't); it's because I translated array types as pointers, which can have one value assigned to them. Even after merging #38, we'll need to extend it to get arrays initialized correctly too, but that'll be much easier than what I've put Alec through so far :wink:

We don't, strictly speaking, have to check any of C's constraints, such as not returning arrays from functions. I don't expect Corrode to produce all the diagnostics a real C compiler would; I don't mind requiring people to check that their code passes GCC or similar before running it through Corrode. But when it's convenient to check, and we might generate wrong code, it's nice to report that during translation.

jameysharp avatar Jul 29 '16 16:07 jameysharp

Just a comment about array types. The C standard talks a bit about incomplete array types (section 6.2.5 point 22). That makes me a bit unsure about how exactly we should be representing and using IsArray. Also worth checking out is, CArraySize in language-c.

And yeah, once #38 gets merged in and IsArray is figured out, array initialization should be fairly straightforward.

harpocrates avatar Jul 29 '16 16:07 harpocrates

I've merged #38, and I fixed #18 earlier, so I think the path is clear to push forward on this now!

jameysharp avatar Jul 31 '16 11:07 jameysharp

Awesome, I will get to it very soon then!

lamarqua avatar Jul 31 '16 17:07 lamarqua

Just leaving a comment to say that I'm actually working on it as of this week. Hoping to push an imperfect PR in the next days :-). I am interleaving this with learning the Haskell I need as I go (the lazy approach to learning, how fitting!), which is why it's not the fastest it could be. I wanna do more stuff on the testing infra as well, since it seems not much has been done on that front either, but it'll be a separate "sprint".

lamarqua avatar Sep 21 '16 14:09 lamarqua

@lamarqua any update on this?

emberian avatar Nov 06 '16 22:11 emberian

Not sure if this is the same issue as the one I am hitting but it seems like it

corrode -c main.c -Iinclude/ -Wall -Wextra
<command-line>:0:1: warning: undefining "__LINE__"
("/usr/include/libio.h": line 310): Corrode doesn't handle this yet:
    _unused2[15 * sizeof(int) - 4 * sizeof(void *) - sizeof(size_t)]
makefile:27: recipe for target 'main.o' failed
make: *** [main.o] Error 1

0xmichalis avatar Dec 17 '16 13:12 0xmichalis

I tried to convert afl-fuzz tonight. Ran into this as well I think:

<command-line>:0:1: warning: undefining "__LINE__"
("afl-fuzz.c": line 134): Corrode doesn't handle this yet:
virgin_bits[1 << 16]

frewsxcv avatar Feb 07 '17 04:02 frewsxcv

@lamarqua could you take another look at this issue?

flip111 avatar Mar 14 '17 09:03 flip111

another array issue:

$ cat test.c
int main() {
  char a[128];
  return *a;
} 

$ corrode test.c
<command-line>:0:1: warning: undefining "__LINE__" [enabled by default]
("test.c": line 3): illegal dereference of non-pointer; check whether a real C compiler accepts this:
    *a

martindemello avatar Jul 04 '17 05:07 martindemello