firebird
firebird copied to clipboard
Add padding for on-disk structures to ensure their sizes are always guaranteed
In order to guarantee that the on-disk structures have always the expected sizes, we should use padding fields within these structs rather than relying on the native alignment of the target architecture which can result in unexpected sizes for architectures with unusual native alignment.
First of all, what architectures with "unusual native alignment" we're talking about? Can you provide examples?
m68k has 16-bits native alignment despite being a 32-bits architecture.
Second, your padding after all *[1] array-like fields is wrong, as these last elements are just an indication of variable-sized tails, it makes no sense to align the appropriate structures' size.
If these arrays are variable-sized, why does the build system then check the size of the ODS structs for expected sizes?
Using padding to make struct sizes not depend on a particular alignment is not uncommon, see for example the code of the NTP daemon: http://bk.ntp.org/ntp-dev/include/ntp_request.h?PAGE=anno&REV=5a7e3586DWwVi6dB215LtzM3E0ktSQ
On 17.07.2019 12:48, John Paul Adrian Glaubitz wrote:
Second, your padding after all *[1] array-like fields is wrong, as these last elements are just an indication of variable-sized tails, it makes no sense to align the appropriate structures' size.
If these arrays are variable-sized, why does the build system then check the size of the ODS structs for expected sizes?
As far as I remember not only sizes but also offsets of all members. That's done to make sure databases format is binary compatible between various HW and OS at as early step as possible.
Using padding to make struct sizes not depend on a particular alignment is not uncommon, see for example the code of the NTP daemon: http://bk.ntp.org/ntp-dev/include/ntp_request.h?PAGE=anno&REV=5a7e3586DWwVi6dB215LtzM3E0ktSQ
Certainly, but in this case it makes no sense. At least in suggested form.
As far as I remember not only sizes but also offsets of all members. That's done to make sure databases format is binary compatible between various HW and OS at as early step as possible.
Yes. But my point is: You are saying that there are arrays in the structs which can vary in size, so adding an explicit padding is wrong as that would cause conflicts with dynamic array sizes. However, if the array sizes are dynamic, it wouldn't be possible to guarantee the structs have a guaranteed size. That's only possible if the struct consists of members only of static sizes and offsets.
And, yes, I understand the size and offset checks are necessary to guarantee binary format compatibility. This is why I think it makes more sense to use padding so that you are not relying on the compiler to determine the size of the struct but explicitly do that yourself.
Certainly, but in this case it makes no sense. At least in suggested form.
I'm not sure I understand the problem. The padding I put in there are currently put in there by the compiler, implicitly. All what I am doing is add the padding explicitly which is why the change does not break the builds on other architectures and also does not change the on-disk format.
The padding is already there, it's just implicit and therefore invisible. I'm just making it visible which also makes the code easier to read.
On 17.07.2019 13:18, John Paul Adrian Glaubitz wrote:
I'm not sure I understand the problem.
sorry :(
The padding I put in there are currently put in there by the compiler, implicitly. All what I am doing is add the padding explicitly which is why the change does not break the builds on other architectures and also does not change the on-disk format.
Try to build with your changes and open existing database after it ;)
I'm not sure I understand the problem.
Padding in the middle of structures to ensure offsets of members is fine. Padding at the end of structures to ensure sizes is a wrong thing in some cases. Dmitry told you that.
Padding in the middle of structures to ensure offsets of members is fine. Padding at the end of structures to ensure sizes is a wrong thing in some cases. Dmitry told you that.
But how is it possible then to guarantee the size of a struct if the end part of the struct can vary in size?
This is the part I don't understand. Looking at src/misc/ods.txt, all structs in there have fixed struct sizes which would conflict with dynamic arrays or am I missing something?
Try to build with your changes and open existing database after it ;)
Okay, I will give it a try later today to verify that.
But how is it possible then to guarantee the size of a struct if the end part of the struct can vary in size?
For such structures this guarance is not needed.
PS: I still think that a set of static_asserts would be better for check instead of ods.txt.
For such structures this guarance is not needed.
Why is the size checked then and the build fails if mismatched?
From: https://buildd.debian.org/status/fetch.php?pkg=firebird3.0&arch=m68k&ver=3.0.5.33100.ds4-2&stamp=1549631205&raw=0
diff -u /<<PKGBUILDDIR>>/src/misc/ods.txt /<<PKGBUILDDIR>>/gen/ods.txt
--- /<<PKGBUILDDIR>>/src/misc/ods.txt 2019-02-02 00:04:21.000000000 +0000
+++ /<<PKGBUILDDIR>>/gen/ods.txt 2019-02-08 13:06:41.000000000 +0000
@@ -58,7 +58,7 @@
irtd_itype 2
irtd_selectivity 4
- *** struct header_page 136
+ *** struct header_page 134
hdr_header 0
hdr_page_size 16
hdr_ods_version 18
@@ -88,7 +88,7 @@
hdr_tra_high 124
hdr_data 132
- *** struct page_inv_page 32
+ *** struct page_inv_page 30
pip_header 0
pip_min 16
pip_extent 20
@@ -100,16 +100,16 @@
scn_sequence 16
scn_pages 20
- *** struct pointer_page 36
+ *** struct pointer_page 34
ppg_header 0
ppg_sequence 16
ppg_next 20
ppg_count 24
ppg_relation 26
ppg_min_space 28
-ppg_page 32
+ppg_page 30
- *** struct tx_inv_page 24
+ *** struct tx_inv_page 22
tip_header 0
tip_next 16
tip_transactions 20
@@ -120,7 +120,7 @@
gpg_dummy1 20
gpg_values 24
- *** struct rhd 16
+ *** struct rhd 14
rhd_transaction 0
rhd_b_page 4
rhd_b_line 8
@@ -128,7 +128,7 @@
rhd_format 12
rhd_data 13
- *** struct rhde 20
+ *** struct rhde 18
rhde_transaction 0
rhde_b_page 4
rhde_b_line 8
@@ -148,18 +148,18 @@
rhdf_f_line 20
rhdf_data 22
- *** struct blh 32
+ *** struct blh 30
blh_lead_page 0
blh_max_sequence 4
blh_max_segment 8
blh_flags 10
blh_level 12
-blh_count 16
-blh_length 20
-blh_sub_type 24
-blh_charset 26
-blh_unused 27
-blh_page 28
+blh_count 14
+blh_length 18
+blh_sub_type 22
+blh_charset 24
+blh_unused 25
+blh_page 26
*** struct Descriptor 12
dsc_dtype 0
make[4]: *** [Makefile:250: preliminaryCheck] Error 1
make[4]: Leaving directory '/<<PKGBUILDDIR>>/gen'
make[3]: *** [Makefile:178: master_process] Error 2
make[3]: Leaving directory '/<<PKGBUILDDIR>>/gen'
make[2]: *** [Makefile:66: firebird] Error 2
make[2]: Leaving directory '/<<PKGBUILDDIR>>/gen'
make[1]: *** [Makefile:6: firebird] Error 2
make[1]: Leaving directory '/<<PKGBUILDDIR>>'
make: *** [debian/rules:152: build-server-stamp] Error 2
Because Alex did it this way. See my PS above.
I'd say that build-time checks for sizeof(struct) is redundant for all structs with variable-length tail. Checking just all member offsets is absolutely enough. If this is fixed/improved, then padding at the end will become unnecessary.
On 17.07.2019 13:31, Dimitry Sibiryakov wrote:
PS: I still think that a set of static_asserts would be better for check instead of ods.txt.
For master certainly (PR is welcome), in B3_0 we can't use them.
I'd say that build-time checks for sizeof(struct) is redundant for all structs with variable-length tail.
Okay, that resolves the contradiction. Thanks.
Checking just all member offsets is absolutely enough. If this is fixed/improved, then padding at the end will become unnecessary.
I agree. I can look into whipping up a patch for that.
Funny, I just found this PR again as I stumbled across this bug again in the firebird3.0
package, firebird4.0
has not been accepted into Debian unstable yet.
Have there been any fundamental changes to this in Firebird 4.0? It seems like the ods.txt
file has been removed.
The ods.txt
file has become static asserts now, but the addition of explicit padding is still necessary:
As far as I remember not only sizes but also offsets of all members. That's done to make sure databases format is binary compatible between various HW and OS at as early step as possible.
Indeed. What you currently have:
struct foo {
char a;
int b;
};
static_assert(offsetof(struct foo, b) == 4);
This fails on m68k (b
is put at offset 2).
Adrian’s changes introduce this:
struct foo {
char a;
short padding;
int b;
};
This makes things pass on m68k and does not change other architectures.
@aafemt noticed that we really should have three bytes padding:
struct foo {
char a;
char padding1;
short padding2;
int b;
};
// - or -
struct foo {
char a;
char padding[3];
int b;
};
This (either one) is indeed necessary to fix this for all possible systems (that do not overshoot padding into the other direction) per the ISO C standard, and it (the lower one, not the upper one) is the way I would write this: make all padding that is relied on explicit.
I’ve taken @glaubitz ’ last patch to make the padding explicit and updated it for the latest firebird3.0 we have in Debian. I’ve yet to test this (m68k machines are slow, and not all build dependencies are available yet due to the transition to 64 bit); I’ll report back once tested (I hope I remember this).
For firebird4 which has the static asserts, the patch can get much smaller because the sizeof
of the struct with the flexible array member at the end is no longer checked. (You’ll want to look into using C99 FAM when present for these, too; offsetof
for them can be checked, but allocation needs malloc(MAX(sizeof(struct header_page), offsetof(struct header_page, hdr_data)))
according to what I gathered from Jₑₙₛ Gustedt’s notes on that topic. This expression will also work with pre-C99 one-sized dummy tail elements.)
I would use alignas
instead of padding in this case.
This has two downsides:
-
alignas(alignof(int))
still aligns to 2, not 4, on m68k -
alignas
is not in C89, not even in C99
Oops, sorry, I missed that this PR is for version 3.
The changes from the PR should not be applied.
Instead, the patch from https://github.com/FirebirdSQL/firebird/pull/217#issuecomment-2005120147 should be applied. This does not change the on-disc structure and makes the padding explicit. I can confirm it builds.
Yes, this is for 3.0… we tackle 4.0 when we come to it, but you can probably apply parts of this already.