Replace EM_BOOL with normal C/C++ bool type. NFC
There is no need that I can think of to use a custom type/macro here.
This also reduces the size of several structs due to the fact that the C/C++ bool type is smaller than int.
Keep EM_BOOL/EM_TRUE/EM_FALSE around for backwards compat but don't use them internally anymore.
This change was created using sed to substitutions
Updated to include struct size saves and code size savings.
I could do this in two phases perhaps:
- Narrow the type of EM_BOOL
- Remove the use of EM_BOOL
Why do struct sizes change? I'd expect an int and an unpacked bool to both take 32 bits. Is that wrong?
Why do struct sizes change? I'd expect an int and an unpacked bool to both take 32 bits. Is that wrong?
I guess bool is just a single byte in C/C++?
Indeed this program printf 1 1 on my desktop system:
$ cat test.c
#include <stdbool.h>
#include <stdio.h>
int main() {
printf("%ld %ld\n", sizeof(bool), _Alignof(bool));
return 0;
}
Why do struct sizes change? I'd expect an int and an unpacked bool to both take 32 bits. Is that wrong?
I guess
boolis just a single byte in C/C++?Indeed this program printf
1 1on my desktop system:$ cat test.c #include <stdbool.h> #include <stdio.h> int main() { printf("%ld %ld\n", sizeof(bool), _Alignof(bool)); return 0; }
(same for C++ BTW)
By itself it's 1 byte, I guess, but inside a struct the field after it might be aligned:
#include <stdbool.h>
#include <stdio.h>
class C {
bool x;
int y;
};
int main() {
printf("%ld\n", sizeof(C));
return 0;
}
That prints 8 for me, so 4 bytes for each field. I'd expect that to be the common case, I'm surprised we save anything actually...
Anyhow, yeah, if this causes struct layout changes then splitting the PR as much as possible sgtm.
By itself it's 1 byte, I guess, but inside a struct the field after it might be aligned:
#include <stdbool.h> #include <stdio.h> class C { bool x; int y; }; int main() { printf("%ld\n", sizeof(C)); return 0; }That prints
8for me, so 4 bytes for each field. I'd expect that to be the common case, I'm surprised we save anything actually...
Right, that is just normal struct layout rules. If you put the bool last you will see difference. Or if you have several booleans toegther in the struct.
Lowering the alignment of a type can for sure shrink struct sizes. That is the primary use of the alignment of a given type. Higher alignment == worse for struct size.
Rebased now that #22157 has landed. I'm planning on waiting for a release or two before landing this larger change.
Added a test.
Thanks - which is it though? (this is a pretty big diff and I don't see a separate commit for it, this would have been a nice case to not squash actually I think?)
Added test as a separate commit
Thanks, lgtm.
This change is now ready to land I think
Just to note, this looks like a massive user-facing breaking ABI change. The previous sizeof(EM_BOOL) change from 4 bytes to 1 byte also was a breaking ABI change, and it regressed my Emscripten downstream library at https://github.com/juj/wasm_webgpu/issues/46 (I was lucky to have added #pragma clang diagnostic error "-Wpadded" in my code, otherwise it would have invited silent failure). There could be projects out there that have used EM_BOOL, and their pre-compiled code will break on this change.
Looking at stdbool.h, it has
#ifndef _STDBOOL_H
#define _STDBOOL_H
#ifndef __cplusplus
#define true 1
#define false 0
#define bool _Bool
#endif
#define __bool_true_false_are_defined 1
#endif
What happens if C code is compiled with -std=c89 or -std=c90 that does not yet have the _Bool data type? Is there a #define somewhere else that lets that code compile?
Looking at stdbool.h, it has
#ifndef _STDBOOL_H #define _STDBOOL_H #ifndef __cplusplus #define true 1 #define false 0 #define bool _Bool #endif #define __bool_true_false_are_defined 1 #endifWhat happens if C code is compiled with -std=c89 or -std=c90 that does not yet have the
_Booldata type? Is there a#definesomewhere else that lets that code compile?
Musl's <stdbool.h> continues to work regardless of which C standard you use:
$ cat test.c
#include <stdbool.h>
bool x = true;
$ ./emcc -std=c89 test.c
Just to note, this looks like a massive user-facing breaking ABI change. The previous sizeof(EM_BOOL) change from 4 bytes to 1 byte also was a breaking ABI change, and it regressed my Emscripten downstream library at juj/wasm_webgpu#46 (I was lucky to have added
#pragma clang diagnostic error "-Wpadded"in my code, otherwise it would have invited silent failure). There could be projects out there that have used EM_BOOL, and their pre-compiled code will break on this change.
I think this particular change is an NFC. That change that could be potentially breaking is the one that already landed and was already released in 3.1.62: #22157. Right?
Just to note, this looks like a massive user-facing breaking ABI change. The previous sizeof(EM_BOOL) change from 4 bytes to 1 byte also was a breaking ABI change, and it regressed my Emscripten downstream library at juj/wasm_webgpu#46 (I was lucky to have added
#pragma clang diagnostic error "-Wpadded"in my code, otherwise it would have invited silent failure). There could be projects out there that have used EM_BOOL, and their pre-compiled code will break on this change.I think this particular change is an NFC. That change that could be potentially breaking is the one that already landed and was already released in 3.1.62: #22157. Right?
If you like we can wait a few more releases to see if any other fallout is reported.
So far I have only heard of one regression which was someone who was using the int type for a callback return type rather then EM_BOOL.
Just to note, this looks like a massive user-facing breaking ABI change. The previous sizeof(EM_BOOL) change from 4 bytes to 1 byte also was a breaking ABI change, and it regressed my Emscripten downstream library at juj/wasm_webgpu#46 (I was lucky to have added
#pragma clang diagnostic error "-Wpadded"in my code, otherwise it would have invited silent failure). There could be projects out there that have used EM_BOOL, and their pre-compiled code will break on this change.
Just in case I'm missing something, here are the 3 types of possible fallout that I can think of that might result from this type of change:
- Somebody linking object files compiled with different versions of emscripten (struct sizes disagree)
- Somebody hardcoding struct offsets (for example in JS code that doesn't use macros for this).
- Somebody passing a callback function using
intin the signature whereEM_BOOLshould be.
Are there any others that we should also consider?
Note: Only (1) and (2) above can result in silent failure whereas (3) is a compiler error.
here are the 3 types of possible fallout that I can think of that might result
I think these scenarios are probably accurate. In (1) I would note that it is not necessarily only struct sizes that disagree, but also C++ name mangling that changes, since EM_BOOL used to be an int, but becomes a bool.. so I presume that LLVM will give the C++ symbol name a different mangling.
So if someone has a codebase where they define a function with EM_BOOL in the signature in a .cpp file, then they would not have ABI compatibility to link against code that uses those functions compiled with previous version of Emscripten.
2. Somebody hardcoding struct offsets (for example in JS code that doesn't use macros for this).
Unless I have missed a development(?), Emscripten end users cannot use these {{{ C_STRUCTS.emscripten_fetch_t.numBytes }}} types of directives. These are an Emscripten authors only thing. So this "somebody" would be all Emscripten users?
That change that could be potentially breaking is the one that already landed and was already released in 3.1.62: #22157. Right?
Yeah, the actual breakage occurred in #22157. After that, this renaming of EM_BOOL uses to bool looks cosmetic.
This pair of changes looks like first #22157 broke ABI compatibility of EM_BOOL, and then this PR is moving Emscripten away from using EM_BOOL at all. I wonder if the first change was needed for this second change to occur?
If I were to rewind back, I wonder if it might be less disrupting to keep EM_BOOL as-is and not change it at all, and then directly switch Emscripten headers to start using bool in function signatures where there is no risk of compatibility issues, and in new APIs?
Although given that this has already happened, I don't wish to cause any more disruptions.
Unless I have missed a development(?), Emscripten end users cannot use these
{{{ C_STRUCTS.emscripten_fetch_t.numBytes }}}types of directives. These are an Emscripten authors only thing. So this "somebody" would be all Emscripten users?
Emscripten users can use the {{{ C_STRUCTS }}} stuff. The limitation is that they cannot add new structs of their own that don't exist in emscripten. So I would phrase that as "emscripten users who uses EM_BOOL in a struct that they define and then use a hardcoded offset in JS to access members of this stuct". This is a very long way from all emscripten users. In fact, I've yet to hear of a single user effected in this way.
On that matter there has been some progress towards allowing users to extend C_STRUCTS system. The first part of this already landed in: #22251.
In fact, I've yet to hear of a single user effected in this way.
I was a single user who was affected this way! That change totally broke my wasm_webgpu project for my downstream users, and I didn't know about it until someone reported it as a bug to me.
On that matter there has been some progress towards allowing users to extend C_STRUCTS system. The first part of this already landed in: #22251.
That's cool - it would definitely be nice to have "equal" footing for Emscripten end user JS libraries and Emscripten system JS libraries. I understand that user JS libraries can refer to system struct fields, though the usefulness of that is quite limited. Rather, end users most often would like to use the offset mechanism for filling structs in their own projects.
The reason I was using EM_BOOL was that it was already a handy way for the "system" to provide a standard boolean type.
In fact, I've yet to hear of a single user effected in this way.
I was a single user who was affected this way! That change totally broke my wasm_webgpu project for my downstream users, and I didn't know about it until someone reported it as a bug to me.
I see, sorry I didn't process that this was the type of failure you had. Was it this exact issue? i.e. hardcoded offsets in your JS code?
To marshall WebGPU API calls, structs on C side need to be converted to JS side descriptor objects, e.g.
and JS code:
and I had assumed that the memory layout would stay fixed.
Fortunately I had added this at some point though: https://github.com/juj/wasm_webgpu/commit/334517453d0bef1b644381663dc1c3a79a959f07