sv-benchmarks icon indicating copy to clipboard operation
sv-benchmarks copied to clipboard

More precise data model in task definitions?

Open PhilippWendler opened this issue 3 years ago • 6 comments

Currently, the data_model key in a task definition is allowed to have the values ILP32 or LP64. This does not contain enough information to deduce the size of all possible C types.

For example, on 32-bit x86 Linux machines (a commonly known ILP32 platform), gcc uses a format with 80 bit precision to implement long doubles. This is somewhat hidden in the documentation, one can for example deduce it from the sentence "Notice that neither of these options enable any extra precision over the x87 standard of 80 bits for a long double." in the description of the option -m96bit-long-double in gcc's x86 docs, but one also confirm this by trying it out. clang seems to do the same.

However, on 32-bit ARM Linux machines, long double is defined as 64 bit size (source) although this is also an ILP32 platform.

The C standard does not say anything concrete about long double, only that it needs to be at least as large as double (which is fulfilled in both cases). Wikipedia has an overview over the precision of long double on many platforms and operating systems (it can vary from 64 to 128 bit).

This actually affects results and safety of programs, there is even an existing example in this repository: https://github.com/sosy-lab/sv-benchmarks/blob/master/c/floats-esbmc-regression/digits_for.c This program contains a variable x that is declared as long double, and the task is defined to have data model ILP32. However, this task is only safe if long double has more than 64 bits (not sure exactly how many bits are needed, 80 bits are enough). This can even be easily shown without the need for an ARM machine by compiling on x86, once with gcc -mlong-double-64 (program will violate assertion and crash) and once with gcc -mlong-double-80 (program will not violate assertion).

So given only the data model ILP32 and assuming the program to be in GNU C (according to the SV-COMP rules), one cannot claim that this program is safe.

A solution would be to use a more informative string for the data model, e.g., something like x86_64-linux. Then it would be clear that long double has a precision of 80 bit.

Another solution would be to exactly define the size and precision of each C type in the repo's documentation for both the ILP32 and LP64 data models.

Note that if we do the latter, we indeed need to define precision and size of a data type because these can be different. On 64-bit x86 Linux gcc uses 80 bit precision but 128 bit size for long double, on 32-bit x86 Linux gcc uses 80 bit precision and 96 bit size for long double.

A last possible solution would be to explicitly leave the size of long double and similar data types unspecified and claim that the linked program is unsafe (because there exists an ILP32 platform where it violates the assertion). However, this would be extremely difficult for verifiers to implement (particularly but not only verifiers based on some compiler infrastructure). Basically this would be as difficult has having to verify a program for both ILP32 and LP64 at the same time, and we also do not require this.

PhilippWendler avatar Sep 22 '20 14:09 PhilippWendler

A solution would be to use a more informative string for the data model, e.g., something like x86_64-linux.

I'm in favor of this. I would replace ILP32 and LP64 with gcc target triplets that match our current type expectations.

This is concise and well established.

lembergerth avatar Oct 02 '20 14:10 lembergerth

Should this be done before or after this year's competition?

MartinSpiessl avatar Oct 29 '20 20:10 MartinSpiessl

@dbeyer If we want to do this to fix the current underspecification, it would be good to do this soon, preferably before the current data-model names get encoded in lots of tool-info modules.

PhilippWendler avatar Oct 30 '20 11:10 PhilippWendler

In #1217 it came up that not only the size of some types is currently not specified, but also the byte order. Using target triplets would cover this, otherwise we should probably also add this information.

PhilippWendler avatar Nov 09 '20 16:11 PhilippWendler

Just read this. Personally, I like the idea of using target triplets.

zvonimir avatar Nov 09 '20 16:11 zvonimir

Some of the underspecified types are specified in the System V ABI[1], which seems to be a somewhat standard on unix-like systems:

The calling convention of the System V AMD64 ABI is followed on Solaris, Linux, FreeBSD, macOS,[23] and is the de facto standard among Unix and Unix-like operating systems.

quoted from: https://en.wikipedia.org/wiki/X86_calling_conventions#System_V_AMD64_ABI

This for example specifies the size of long double and also the signedness of the type char[2]. There is also a version of the System V ABI for "32bit"/i386[3].

So we might be able to fix these by adding to the rules that we adhere to the System V ABI.

EDIT: of course we could also make this an option in the data-model field, i.e., whether we follow System V ABI or not. An alternative would be the Microsoft (x64) software conventions or just the Microsoft C language reference in general that also list various type sizes. But I guess this would already be kind of subsumed by the gcc triplet that @lembergerth mentioned. If that one contains linux then System V ABI is implied, if it is Windows then it is quite clear that the Microsoft software conventions shall apply.

[1] https://wiki.osdev.org/System_V_ABI [2] https://uclibc.org/docs/psABI-x86_64.pdf [3] https://www.uclibc.org/docs/psABI-i386.pdf

MartinSpiessl avatar Jul 06 '21 06:07 MartinSpiessl