libsoundio icon indicating copy to clipboard operation
libsoundio copied to clipboard

Use size_t for array lengths

Open Yepoleb opened this issue 8 years ago • 23 comments

Most of the _count fields should have the type size_t instead of int, because it's the official type for array sizes. While they could be interpreted as just the number of supported channels, that's not what 95% of the use cases are.

Yepoleb avatar Nov 01 '16 19:11 Yepoleb

The problem with size_t is that, as it is an unsigned type, you get signedness warnings if you do e.g.

    for (int i = 0; i < soundio_thing.foo_count; i++) {

(I usually stick in an (int) cast when this comes up, but of course it's better to avoid the need.)

You could make i a size_t as well, but then that makes a reverse loop more complicated:

    // Danger! Loop continue-condition is always true
    for (size_t i = soundio_thing.foo_count; i >= 0; --i) {

Then if you have i interacting with other signed integers, you either have to add more casts, or standardize on size_t.

ssize_t exists, but it may be a 64-bit type, and some compilers will warn about mixing them with ints. (And IIRC, Windows does not provide this type.)

iskunk avatar Nov 01 '16 19:11 iskunk

Then stop using int for array iteration, it's the wrong type. There's a reason someone added the size_t typedef to the standard and you're completely eliminating its benefits. The reverse loop is wrong because it starts at an index that's 1 bigger than the last one in the array.

The compiler is right when warning you about assuming all array indices are signed 32bit integers. That's not always the case and it makes sense to require the explicit permission from the programmer to do a type conversion.

Yepoleb avatar Nov 01 '16 20:11 Yepoleb

size_t exists for the purpose of representing the size of memory objects. On a 32-bit system, it's 32 bits (max object size = 4,294,967,296 bytes), on a 64-bit system it's 64 bits (max object size = 1.8e19 bytes).

I haven't seen anything about int not being an appropriate type for array iteration; do you have any citations for that?

iskunk avatar Nov 01 '16 20:11 iskunk

I couldn't find any official documents stating that size_t should be used, but there are a few StackOverflow questions about the topic, like this one: http://stackoverflow.com/questions/1951519/when-should-i-use-stdsize-t. It just seems obvious to not use a type that's too small to index all possible array elements.

Yepoleb avatar Nov 01 '16 21:11 Yepoleb

It just seems obvious to not use a type that's too small to index all possible array elements.

You're really looking forward to 2.2 billion channel audio, aren't you? ^_^

More seriously: int has always been the most commonly used integer type in C. It works for most things. Sometimes, you do have to deal with values north of a couple billion, and you've got options: size_t (at least 4.3 billion), [u]int64_t (9.2e18 or 1.8e19), even floating-point types. But ordinary things like sample rates, channel counts, and so on, don't get anywhere near those magnitudes.

Maybe a case can be made for size_t in a few places, like the return value of soundio_ring_buffer_capacity() (as this describes the size of a memory object). But for more prosaic things... I think application developers would tend to find the warnings/casting annoying (as well as admonitions not to use int). And even at the semantic level, making e.g. channel_count a size_t strikes me as bizarre (is each channel supposed to be a byte, or something?).

iskunk avatar Nov 01 '16 22:11 iskunk

There are performance implications to this. I did some research recently and found out that size_t performs better than int for mersenne twister 64 bit implementation on x86_64. That combined with the fact that in C, size_t is the correct size for array indexes means I think this issue is correct and I should make this adjustment in the new version.

andrewrk avatar Nov 01 '16 22:11 andrewrk

There are performance implications to this. I did some research recently and found out that size_t performs better than int for mersenne twister 64 bit implementation on x86_64.

That involves [mb]illions of iterations in a tight loop. Where in libsoundio are you doing anything remotely similar?

I don't think it's established that size_t is the "correct" type for array subscripts; merely that int is insufficient to dereference all possible arrays. Do you have a citation?

I'm reminded of this quote from Donald Knuth:

Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.

iskunk avatar Nov 01 '16 22:11 iskunk

All array length macros return size_t as well as all C++ and Boost containers. If you want some more opinions just search SO.

Yepoleb avatar Nov 01 '16 23:11 Yepoleb

Hi there, I've seen that one can use ptrdiff_t for negative indexing if appropriate. However I'm not sure if it's the right solution for the reversed loop problem.

gracicot avatar Nov 02 '16 13:11 gracicot

@Yepoleb You're right, but all prominent figures of the C++ committee agree that using an unsigned type was a mistake.

There is a performance implication in using unsigned type because the overflow is defined, whereas for signed type, it can safely just do an increment/add without checking for overflow.

I think there are two orthogonal issues, one is having the correct semantic, the other is having the fastest indexing. IMHO, you can have both by doing typedef int_fast32_t SoundIOSize, or some other name that will express fact that the variable holds a byte count.

my 2 cents,

hotgloupi avatar Nov 02 '16 15:11 hotgloupi

Yes, a signed type might have been the better choice, but they made it unsigned and that's what we're stuck with now. You can change your library, but you can't change the standard and mixing both types just makes ugly code.

The performance implications are probably negligible, because compilers would optimize it, if it makes that much of a difference. I think it's safe for them to assume that we won't be needing that 64th bit for a very long time.

Yepoleb avatar Nov 02 '16 16:11 Yepoleb

This is something I thought a lot about in zig. Ultimately I think that it comes down to the fact that you could potentially index a pointer with this type. Therefore it should have the same range as an address, which is unsigned and is the same size as a pointer on the system.

Some of the gotchas here happen because of what I consider to be a mistake in the C language, which is that unsigned integer overflow is defined to wrap. Instead, I believe that integer overflow should be undefined for both unsigned and signed integers. This way (1) compilers can generate overflow checks for breaking this assumption, and there can be explicit wrapping operations, and (2) compilers can optimize with the assumption that unsigned overflow will never happen.

The above is how it works for Zig, but what does that mean for C? I think that size_t is still the correct data type for array indexes. It's unfortunate that there are some gotchas, but that's the world of C that we live in. Earlier in my use of C I was annoyed by dealing with unsigned integers and I wanted everything to be signed. However now that I have a more complete understanding of type systems, assembly, and information theory, I find that unsigned integers make sense in the correct context. The ability to have negative numbers when it only makes sense that the value could be positive is a liability, not an asset.

The argument about int being big enough only works if int is some kind of null hypothesis default type for things, and size_t is the weird new challenger. In fact, it's the other way around, we would need to justify using int instead of size_t for array indexes.

andrewrk avatar Nov 02 '16 17:11 andrewrk

I've also set this precedent in the zig compiler code (written in C/C++) itself: https://github.com/andrewrk/zig/commit/3239b3cb6957aa5f5f04c9d7d9035da14ccd0741

andrewrk avatar Nov 02 '16 18:11 andrewrk

The ability to have negative numbers when it only makes sense that the value could be positive is a liability, not an asset.

You can trivially assign a semantic value to negative values, be it a -1 for an error/invalid/sentinel, or arbitrary negative numbers to indicate e.g. positions before the end of a string (as Perl does). The lack of a sign bit isn't even due to a sign bit not being needed/sensible, but because that bit is needed to express the full range that the type needs to handle (e.g. size_t on a 32-bit system needs to top out at 4GB, not 2GB).

The argument about int being big enough only works if int is some kind of null hypothesis default type for things, and size_t is the weird new challenger. In fact, it's the other way around, we would need to justify using int instead of size_t for array indexes.

int is already the "null hypothesis" integer type in C. It's the default function return type, and the default integer-promotion type for anything smaller.

My understanding is that the C standard doesn't care whether you use a char, short, int or long long as an array subscript. It's only a matter of practicality whether the type you use is big enough to handle the array in question. And as long as we're in the practical realm, the signedness gotcha is a thing.

It's unfortunate that there are some gotchas, but that's the world of C that we live in.

C is not forcing you to do anything here. You can make a case for this change, but the language standard won't help it.

iskunk avatar Nov 02 '16 19:11 iskunk

@euloanty hey, it's not OK to be rude to other users. It's OK to have strong opinions, but passion is not a substitute for clear arguments. Try to keep it technical and on topic. "You are stupid" is harmful to the community. Try to empathize with people who disagree with you - it can help you present a stronger argument.

andrewrk avatar Jan 23 '19 20:01 andrewrk

@euloanty

You are stupid for (size_t i = soundio_thing.foo_count; i != -1; --i ) works

Hmmm...

Correct me if I'm wrong, but isn't size_t unsigned? If so then the i != -1 is not entirely correct because it may never be true. The only case where it is true is the case where i is implicitly converted to some signed type (most likely int) and it's bit-pattern is interpreted as -1. I guess it is the case in all or at least most implementations, but it is not a clear solution.

Innokentiy-Alaytsev avatar Jan 23 '19 20:01 Innokentiy-Alaytsev

This is my favorite way to loop backwards with an unsigned int:

for (size_t i = soundio_thing.foo_count; i != 0;) {
    i -= 1;
    // your logic
}

All you have to do is move the i -= 1 to the first line of the loop, and it even works as expected with continue.

andrewrk avatar Jan 23 '19 20:01 andrewrk

@euloanty

I am sorry for being wrong, thank you for pointing me at my ignorance.

Innokentiy-Alaytsev avatar Jan 24 '19 01:01 Innokentiy-Alaytsev

@Innokentiy-Alaytsev please stay on topic. @euloanty is no longer welcome to participate in libsoundio development for being rude.

andrewrk avatar Jan 24 '19 01:01 andrewrk