fut icon indicating copy to clipboard operation
fut copied to clipboard

Uint is 31-bit

Open Marco012 opened this issue 2 years ago • 4 comments

I saw in the documentation that the uint type is not 32-bit unsigned integer, but a 31-bit one. What is the point of it? Is there any way to use a 32-bit unsigned integer? This is a bit strange, and I was having trouble with my code because of this.

Marco012 avatar Apr 21 '22 15:04 Marco012

The reason is that Java and JavaScript do not provide 32-bit unsigned arithmetic. The separate type only documents that negative values are not expected.

I'd like to have a true 32-bit unsigned type. This will require non-trivial translations to Java and JavaScript.

Which target languages do you need?

pfusik avatar Apr 21 '22 16:04 pfusik

Hm I understand, I was expecting to be something like that, but didn't know it was about java. I'm developing a lib for networking utils, so I need it to be restrict with data types. I am transpiling to C#, C/C++ and Java, might add Swift in the future.

Since Java has this issue, I will need to look for a workaround, I need to see if treating a int as an uint will work the same.

Marco012 avatar Apr 21 '22 17:04 Marco012

I didn't think I would come to this issue again but I have a lot to talk about.

1 - No detection renge detection for unsigned type

So, I found a problem with this. If I have this:

int a = 0xFFFFFFFF;
// or
uint a = 0xFFFFFFFF;

It will generate the following C# code:

int a = (int) 4294967295;

And this is not compilable unless I add unchecked.

2 - Problematic unsigned type

I have the following C# method:

public static uint CalculateCRC32Original(IList<byte> bytes, int length)
{
    int count = 0;
    int i, j;
    uint nextByte, crc, mask;

    i = 0;
    crc = 0xFFFFFFFF;

    while (count < length)
    {
        nextByte = bytes[i];
        crc = crc ^ nextByte;
        for (j = 7; j >= 0; j--)
        {    // Do eight times.
            mask = (uint)-(crc & 1);
            crc = (crc >> 1) ^ (0xEDB88320 & mask);
        }
        i = i + 1;
        count++;
    }

    return ~crc;
}

And I converted it to the following cito method:

public static uint CalculateCRC32(byte[] bytes, int length)
{
    int count = 0;
    int i;
    int j;
    uint nextByte;
    int crc;
    uint mask;

    i = 0;
    crc = 0xFFFFFFFF;

    while (count < length)
    {
        nextByte = bytes[i];
        crc = crc ^ nextByte;
        for (j = 7; j >= 0; j--)
        {    // Do eight times.
            mask = -(crc & 1);
            crc = (crc >> 1) ^ (0xEDB88320 & mask);
        }
        i = i + 1;
        count++;
    }
	
    return ~crc;
}

Now this code generates the following C# method:

public static int CalculateCRC32(byte[] bytes, int length)
{
	int count = 0;
	int i;
	int j;
	int nextByte;
	int crc;
	int mask;
	i = 0;
        // I needed to add unchecked to the line below
	crc = (int) 4294967295;
	while (count < length) {
		nextByte = bytes[i];
		crc = crc ^ nextByte;
		for (j = 7; j >= 0; j--) {
			mask = -(crc & 1);
			crc = (int) (crc >> 1 ^ (3988292384 & mask));
		}
		i = i + 1;
		count++;
	}
	return ~crc;
}

When I call both methods, they give a different output

byte[] bytes = Encoding.ASCII.GetBytes("hello guys");
byte b1 = BitConverter.GetBytes(CalculateCRC32Original(bytes, bytes.Length))[0]; // 207
byte b2 = BitConverter.GetBytes(CalculateCRC32(bytes, bytes.Length))[0]; // 60

This happens because the crc variable is int instead of uint.

After some time, I make it work by changing everything to long:

public static int CalculateCRC32(byte[] bytes, int length)
    {
        int count = 0;
        int i;
        int j;
        long nextByte;
        long crc;
        long mask;
        i = 0;
        
        crc = 0xFFFFFFFF;
        
        while (count < length)
        {
            nextByte = bytes[i];
            crc = crc ^ nextByte;
            for (j = 7; j >= 0; j--)
            {
                mask = -(crc & 1);
                crc = (crc >> 1 ^ (3988292384 & mask));
            }
            i = i + 1;
            count++;
        }
        return (~crc & 0x7FFFFFFF);
    }

I also tested this on java and it is working as expected. This worked fine in my case, but other algorithms may be more troublesome and cannot be fixed as simple as this. Below are some suggestions regarding this matter. 1 - Remove the uint type would be the best right now, it is fast to do it, and no one will be misleaded and fall into the same mistake as me. 2 - Implement real uint and other unsigned types for the languages that support it, and throw warning for those that doesn't support. 3 - Create new types such as Uint8, Uint16, Uint32, that have an underlayer on languages that doesn't have it.

Marco012 avatar Apr 22 '22 14:04 Marco012

int a = 0xFFFFFFFF;
// or
uint a = 0xFFFFFFFF;

And this is not compilable unless I add unchecked.

Your code is invalid. You get a legitimate compilation error from the C# compiler rather than from cito. It's not that bad. As you see, adding unchecked doesn't solve this problem at all, only eliminates the compiler error.

Check out this code:

    public static int CalculateCRC32(byte[] bytes, int length)
    {
        int crc = -1;
        for (int i = 0; i < length; i++)
        {
            crc ^= bytes[i];
            for (int j = 0; j < 8; j++)
                crc = (crc >> 1 & 0x7fffffff) ^ (0xEDB88320 & -(crc & 1));
        }
        return ~crc;
    }

1 - Remove the uint type would be the best right now, it is fast to do it, and no one will be misleaded and fall into the same mistake as me.

This is clearly documented.

You could make a similar error in Java by using its byte type and assuming that it's unsigned as in C#. I know students who made such an error in their networking app. It passed their tests on the 127.0.0.1 IP address.

3 - Create new types such as Uint8, Uint16, Uint32, that have an underlayer on languages that doesn't have it.

What do you mean? There are byte and ushort types already.

pfusik avatar Apr 26 '22 18:04 pfusik