corrode
corrode copied to clipboard
Arrays are broken!
It seems like arrays are decayed into mere pointers. This causes all sorts of problems. Let's start a very simple test example (one that does not require stdio):
int main() { int array[1]; array[0] = 42; return array[0]; }
[adrien@MacBookPro ~/dev/repos/corrode/tests]$ gcc -Wall array.c -o array
[adrien@MacBookPro ~/dev/repos/corrode/tests]$ ./array; echo $?
42
So far so good. But Corrode translates that to:
fn main() {
let ret = unsafe { _c_main() };
std::process::exit(ret);
}
#[no_mangle]
pub unsafe fn _c_main() -> i32 {
let mut array : *mut i32;
{
let _rhs = 42i32;
let _lhs = &mut *array.offset(0i32 as (isize));
*_lhs = _rhs;
}
*array.offset(0i32 as (isize))
}
Calling rustc on the generated code:
[adrien@MacBookPro ~/dev/repos/corrode/tests]$ rustc array.rs
array.rs:11:26: 11:31 error: use of possibly uninitialized variable: `array` [E0381]
array.rs:11 let _lhs = &mut *array.offset(0i32 as (isize));
^~~~~
array.rs:11:26: 11:31 help: run `rustc --explain E0381` to see a detailed explanation
array.rs:14:6: 14:11 error: use of possibly uninitialized variable: `array` [E0381]
array.rs:14 *array.offset(0i32 as (isize))
^~~~~
array.rs:14:6: 14:11 help: run `rustc --explain E0381` to see a detailed explanation
error: aborting due to 2 previous errors
rustc
legitimately complains that we're not initializing the arrray, and he's right! We're never allocating memory anywhere.
Rust has an array type, which I think is appropriate to handle these kind of cases. I feel like the semantics between C arrays and Rust arrays are similar, but I haven't dived deep into it.
There are a couple interesting things to note about arrays, and how we can handle them in Corrode.
First, through some mechanism unknown to me, they are decayed into pointers. I feel like this has to do with baseTypeOf
and derivedTypeOf
(which seem to ignore the relevant information from CDerivedDeclarator
), but I can't say I understand this part of the code quite yet. My understanding is that there isn't an equivalent Rust.Type either, which would contain the relevant information for the translation.
Arrays have a whole bunch of interesting properties:
- their size can be omitted, making the type of the declaration incomplete, but a subsequent declaration can fix this and make the type complete: 6.2.5.22
22 An array type of unknown size is an incomplete type. It is completed, for an identifier of that type, by specifying the size in a later declaration (with internal or external linkage).
This is interesting because I am not sure how the language-c
package would handle cases like this:
int global_array[];
// [...]
int global_array[25];
When we access the relevant information parsed by language-c
, we are definitely more interested in knowing that global_array has 25 elements.
- Ever since C99, arrays can have variable length - they are still allocated on the stack though. But this feature was retrograded to optional in C11 from mandatory in C99, and it is generally frowned upon by a subset of C developers. Implementing this in Rust while preserving the semantics may be hard! (the
alloca
POSIX function [and its Windows] equivalent) may be used instead? - Arrays are decayed into pointers when they are passed as an argument to a function. But also, TIL, you can have a static specifier within the brackets to hint the compiler that the array (while still being decayed) must contain at least N elements! This is amazing and I didn't know that despite having worked with C for a very long time. Having quickly tested it, it seems like language-c accepts it (but I don't know whether it holds the info somewhere, and whether this info would be useful at all in translating to Rust - my gut feeling is probably not).
- Another interesting thing is the interaction between sizeof and arrays. The interesting part to know is that if we declare an array, say:
int array[25];
Using the symbol array
anywhere in C code is gonna be completely equivalent to &array[0]
, except in the case where we take the sizeof of it! In the case of VLA, we want to return the expression used to compute the number of elements in the array times the size of an element in the array. In the case of a non-VLA we simply return the size known at compile time (#elements x sizeof(element)). It is a very common idiom to have a macro defined as such:
#define ARRAY_COUNT(x) (sizeof(x) / (sizeof((x)[0])
You can then use this to get, for example, the size of a compile-time declared string (provided it's declared as an incomplete array):
const char s[] = "Corrode rules!";
size_t len_s = ARRAY_COUNT(s);
I discovered a few other broken things related to sizeof and arrays, but I'll talk about it in detail in the sizeof issue :).
Yes, I implemented a cheesy hack instead of properly handling array types. The current hack gets arrays translated correctly in function arguments, but gets them wrong everywhere else.
Thank you for the detailed and insightful comments on how to do it properly. I didn't know about that use of the static
keyword either, for example!
I'd add that incomplete array types may also be completed by having an initializer. Your last example shows that:
const char s[] = "Corrode rules!";
But here are a couple more examples:
int a[] = { 1, 2, 3 };
int b[] = { [10] = 1 };
a
has three elements and b
has... um... 11? I dunno, fencepost errors are hard.
Fully fixing array handling probably needs to wait for #38 to get merged so we have a solid foundation for handling initializers. But you could certainly fix most of the current bugs just by getting the internal type representation right. I think that means:
- Add an
IsArray
constructor alongsideIsPtr
; - Check every current use of
IsPtr
and see if it needs to handleIsArray
as well (maybe write a helper that convertsIsArray
toIsPtr
and use that any place where the two should be equivalent?); - Fix the
derive (CArrDeclr ...)
case to produceIsArray
instead ofIsPtr
; - Have the
derive (CFunDeclr ...)
case convertIsArray
back toIsPtr
for function arguments... I think?
It might be worth tackling #18 before doing this so it's more clear which bits of syntax allow pointers, which would help in deciding which ones should also be able to handle arrays.
Oh, and C99 says somewhere that functions can't return array types (citation needed). That's easy to check for in the derive (CFunDeclr ...)
case so we might as well add a call to badSource
if that comes up.
Thanks for the detailed implementation plan!
A few more thoughts and remarks in general:
- VLAs may prove difficult to implement in general in a "pure" semantics preserving way as it seems that Rust doesn't allow dynamically sized types on the stack ; but allocating them on the heap with Vectors should be totally doable on the other hand.
- Initializers for arrays are unsupported! At least, for arrays of length >= 2.
void f() {
int array[2] = { 1, 2 };
}
("array2.c": line 25): illegal scalar initializer; check whether a real C compiler accepts this:
{ 1, 2 }
Is the Corrode code that handles the same as the one referred to in issue #38 ?
- As for
Oh, and C99 says somewhere that functions can't return array types (citation needed).
The relevant part is 6.7.5.3, §1
A function declarator shall not specify a return type that is a function type or an array type
So we probably need to check for that as well.
I think I may have spotted a couple other issues, but can't remember them on top of my head nor I can find where I put my notes...
I'd like to get patches adding support for arrays that is correct except for not having VLA support at first. Variable length arrays have enough challenging quirks that I think it's worth deferring them in order to focus on handling the basics. That said, translating them as Vec someday seems OK, probably? I haven't thought about that very hard yet.
Yes, pull request #38 should be fixing most of the initializer bugs soon. But the error you're seeing isn't because initializing single-element arrays works (it doesn't); it's because I translated array types as pointers, which can have one value assigned to them. Even after merging #38, we'll need to extend it to get arrays initialized correctly too, but that'll be much easier than what I've put Alec through so far :wink:
We don't, strictly speaking, have to check any of C's constraints, such as not returning arrays from functions. I don't expect Corrode to produce all the diagnostics a real C compiler would; I don't mind requiring people to check that their code passes GCC or similar before running it through Corrode. But when it's convenient to check, and we might generate wrong code, it's nice to report that during translation.
Just a comment about array types. The C standard talks a bit about incomplete array types (section 6.2.5 point 22). That makes me a bit unsure about how exactly we should be representing and using IsArray
. Also worth checking out is, CArraySize
in language-c
.
And yeah, once #38 gets merged in and IsArray
is figured out, array initialization should be fairly straightforward.
I've merged #38, and I fixed #18 earlier, so I think the path is clear to push forward on this now!
Awesome, I will get to it very soon then!
Just leaving a comment to say that I'm actually working on it as of this week. Hoping to push an imperfect PR in the next days :-). I am interleaving this with learning the Haskell I need as I go (the lazy approach to learning, how fitting!), which is why it's not the fastest it could be. I wanna do more stuff on the testing infra as well, since it seems not much has been done on that front either, but it'll be a separate "sprint".
@lamarqua any update on this?
Not sure if this is the same issue as the one I am hitting but it seems like it
corrode -c main.c -Iinclude/ -Wall -Wextra
<command-line>:0:1: warning: undefining "__LINE__"
("/usr/include/libio.h": line 310): Corrode doesn't handle this yet:
_unused2[15 * sizeof(int) - 4 * sizeof(void *) - sizeof(size_t)]
makefile:27: recipe for target 'main.o' failed
make: *** [main.o] Error 1
I tried to convert afl-fuzz tonight. Ran into this as well I think:
<command-line>:0:1: warning: undefining "__LINE__"
("afl-fuzz.c": line 134): Corrode doesn't handle this yet:
virgin_bits[1 << 16]
@lamarqua could you take another look at this issue?
another array issue:
$ cat test.c
int main() {
char a[128];
return *a;
}
$ corrode test.c
<command-line>:0:1: warning: undefining "__LINE__" [enabled by default]
("test.c": line 3): illegal dereference of non-pointer; check whether a real C compiler accepts this:
*a