fut Integer data types, signednes, width

While reading the spec I noticed that the choices for the integer data types are kinda weird and I'd suggest following the route C has gone with the stdint.h types.

And even though most people in C, C++, C#, and Java may be used to using int and long: I'd suggest to deprecate those two types as at least with C and C++ you are producing unportable code when naively using those types as the sizes of those types differ across platforms (hence the suggestion to be explicit with integer sizes).

Furthermore your spec notes that explicit integer ranges like 0..100 foo; are not enforced: If they aren't enforced this makes little sense and might even cause confusion when your compiler decides to use a different underlaying data type than what the programmer expects.

Apr 03 '22 10:04 BenBE

long in Ć is translated to int64_t in C and C++. int is 32-bit on all modern hardware.

Range types are not about range checking. They are for efficient memory representation of large arrays: it's a type that can store integers in the given range. In fact, I never use range types for non-array variables when I code in Ć. What's the use? For example, Java doesn't have an unsigned byte data type, so you normally have to do array[i] & 0xff to convert a signed byte to unsigned. But if the array elements are 0 .. 100, there's no need for the binary AND operation.

Apr 03 '22 13:04 pfusik

long in Ć is translated to int64_t in C and C++. int is 32-bit on all modern hardware.

But int being int32_t is not guaranteed in C for all platforms and I have worked with MCUs where int is actually int16_t. I don't mind int being mapped to int32_t, but in the translation to the target language it should be done portably as is the aim of this project.

Range types are not about range checking. They are for efficient memory representation of large arrays: it's a type that can store integers in the given range. In fact, I never use range types for non-array variables when I code in Ć. What's the use? For example, Java doesn't have an unsigned byte data type, so you normally have to do array[i] & 0xff to convert a signed byte to unsigned. But if the array elements are 0 .. 100, there's no need for the binary AND operation.

How much cost would be added to the Ci compiler to actually check these "promises" by the programmer? AFAIR LLVM and GCC do some tracking of intermediate values which allows them to notify you about certain conditions you check to always be true or false by the point you reach them. Wouldn't this be something to include with the Ci compiler too (at least as a warning)? Especially if this decision affects code generation/transpilation you might end up in situations where some security check like if (x > 5) doesn't hold, because your promised 0..100 range accidentally wrapped around to -128 due to some calculation oversight. IMHO without some sort of warnings/enforcement this bears quite some risk for (hard-to-detect) bugs that may impact the security of the generated code.

Apr 03 '22 18:04 BenBE

Yes, I'm aware of 16-bit MCUs. Emitting int32_t everywhere would destroy the performance and moreover pump the binary size which is at premium in such MCUs. It will also make all the APIs harder to use for the majority of users that have 32-bit and 64-bit compilers. I'd love to have even more portability, but not at all cost. You can imagine how poorly these MCUs deal with 64-bit integers, floats and doubles, if they do at all.

Yes, range checking would be possible. In general, this isn't possible compile-time. Swift does it at runtime. Python does it for typed arrays. I also think Ada does it, including ranges. I'm not sure if you ask about the runtime performance cost, or the development effort needed for cito to inject range checks?

Apr 03 '22 18:04 pfusik

Yes, I'm aware of 16-bit MCUs. Emitting int32_t everywhere would destroy the performance and moreover pump the binary size which is at premium in such MCUs. It will also make all the APIs harder to use for the majority of users that have 32-bit and 64-bit compilers. I'd love to have even more portability, but not at all cost. You can imagine how poorly these MCUs deal with 64-bit integers, floats and doubles, if they do at all.

Understood. :) Any chance to at least see the stdint types be added for direct usage in Ci?

Yes, range checking would be possible. In general, this isn't possible compile-time. Swift does it at runtime. Python does it for typed arrays. I also think Ada does it, including ranges. I'm not sure if you ask about the runtime performance cost, or the development effort needed for cito to inject range checks?

Mostly asking re development cost.

Apr 03 '22 21:04 BenBE

Not sure what you ask for? stdint.h is used by cito. byte is uint8_t. short is int16_t. The only non-stdint integer types are int and ranges. Yes, I'll consider translating to int32_t as an option. But this isn't enough. Code such as:

short s = ...;
s = s * 2 / 3;

would still overflow if int is 16-bit. cito would need to inject (int32_t) casts.

As for range checks, my design goal was that the generated code remains human-readable and obviously corresponding to the source Ć code. Range checks with ifs would definitely obfuscate it. This isn't an easy change. Optimally, there would be some kind of __attribute__ to specify the range for the type and GCC/Clang (or perhaps C#/Java/etc) would inject the checks. A recent addition to C# are contracts. They aren't probably powerful enough to check ranges everywhere, only as pre- and post-conditions. But I think that's a good start.

Apr 04 '22 07:04 pfusik

What I meant with the int types was whether it's possible in Ci to use something like:

uint64_t foo = 42;

if I want to be explicit, or am I forced to use

long foo = 42;

?

Apr 04 '22 10:04 BenBE

long is how C#, Java, OpenCL and Ć spell 64-integers.

C:\0\ci>type 60.ci
public class Test
{
	public long Bar()
	{
		long foo = 42;
		return foo;
	}
}

C:\0\ci>cito -o 60.cpp 60.ci

C:\0\ci>type 60.cpp
// Generated automatically with "cito". Do not edit.
#include "60.hpp"

int64_t Test::bar() const
{
	int64_t foo = 42;
	return foo;
}

First, Ć syntax resembles C# more than C++. Second, int64_t is longer to type and read than long. Third, I think that adding alias names would only increase confusion.

You have a point that there's currently no unsigned 64-bit integer and no unsigned 32-bit integer. These two are problematic in Java that doesn't support them natively. I'd like to add them to Ć at some point.

Apr 04 '22 10:04 pfusik

I am not going to change current spellings. int is easier to read, say and type than int32_t and long than int64_t.

The missing 32-bit uint and 64-bit ulong are tracked in #61.

Aug 17 '23 14:08 pfusik

fut fut copied to clipboard

Integer data types, signednes, width

fut
fut copied to clipboard