flatcc
flatcc copied to clipboard
Minimal reproduction test case for misaligned vector of double
I believe I have narrowed down the problem with #210 further. It appears to happen only for buffers with size.
I've made new doublevec_test with a very minimal schema and two test cases, one run without size prefix and one with.. the latter fails with "vector header out of range or unaligned".
ctest --verbose output:
20: doublevec table:
20: 00000000 04 00 00 00 f4 ff ff ff 04 00 00 00 00 00 00 00 |................|
20: 00000010 06 00 08 00 04 00 |......|
20: doublevec table with size:
20: 00000000 1a 00 00 00 08 00 00 00 00 00 00 00 f4 ff ff ff |................|
20: 00000010 04 00 00 00 00 00 00 00 06 00 08 00 04 00 |..............|
20: running debug doublevec test
20: doublevec buffer failed to verify, got: vector header out of range or unaligned
The table definition is:
table DoubleVec {
a: [double];
}
And the gen code is creating as root and an empty vector for a as it does not seem to matter if I fill it or not with something.
If I create two variants without setting the a vector at all both verifies and looks like this:
1: doublevec table:
1: 00000000 04 00 00 00 fc ff ff ff 04 00 04 00 |............|
1: doublevec table with size:
1: 00000000 0c 00 00 00 04 00 00 00 fc ff ff ff 04 00 04 00 |................|
Great reproduction case.
First note that it could be completely random that one buffer works and the other doesn't if for example that aligment argument given the the create vector call was 4 instead of the expected 8.
Based on you first dump:
In the buffer without size prefix, the vector size field starts at offsets 0x000c (12 dec) and contains 4 zero bytes. In the buffer with size prefix, the vector size field starts at offset 0x0014 (20 dec) and contains 4 zero bytes.
Observations: this size field must be aligned to 4 bytes. Because the vector is a double, the byte that follows the size field must be aligned to 8 bytes.
In the buffer without size field, the following address is 0x0010 (16 dec) which is a multiple of 8. In the buffer with size field the following address is 0x0018 (24 dec) which is a multple of 8.
The vectors are therefore correctly aligned in both cases, assuming I have made no mistake in the above.
In the buffer without size prefix there is no zero padding after the header. That is coincidental because it just happened to not be necessary. The table starts at offset 0x0004 which is stored as the root table offset in the first 4 bytes of the table header at offset 0x0000. The table starts with the vtable offset which is the negative value "f4 ff ff ff" or 0xfffffff4, or -12 dec, meaning the vtable appears 12 bytes later. That is just how FlatBuffers work. And indeed we find the 6 byte long vtable at that location.
In the buffer with size prefix, there are 4 bytes of zero padding at offset 0x0008 (8 dec). The table starts at offset 0x000c (12 dec) storing a negative -12 value to the vtable 12 bytes later where the 6 byte vtable is found. The header stores the buffer size at offset 0x0000 with the value 0x0000001a (26 dec) which is the length of the printed buffer excluding the 4 byte size prefix itself. (I do not recall if the size prefix should include itself or not). ~~The root table offset is found at offset 0x0004 (4 dec) and contains the location 0x0008 (8 dec) - this wrong since this is the zero padded location.~~ The root table offset is found at offset 0x0004 (4 dec) and contains the offset 0x0008 (8 dec) which must be added to the buffer start after the size field, so the effective root table location is 0x000c (12 dec) which is where the vtable offset is located as expected.
~~It appears that the root table offset is wrong when a size prefixed buffer needs global zero padding after the header.~~
EDIT: The buffer appears to valid in both cases and the problem is likely with the verifier.
I made a mistake in my conclusion - see edit above.
To further confirm that the buffer is valid, try access a vector with actual data in the buffer with size prefix and confirm the pointer addresses and content when reading the data with ordinary reader interface. Make sure the buffer is aligned including the size prefix.
Thanks for your analysis @mikkelfj. I will see if I can confirm this by verifying the buffers with the C++ implementation.
I don't think C++ verifies aligment.
@mikkelfj ok :( But I went ahead and did it anyway and got strange results...
The "empty" doublevec table verifies, i.e this passes:
const unsigned char flatbuf2[] =
{
0x04, 0x00, 0x00, 0x00, 0xfc, 0xff, 0xff, 0xff, 0x04, 0x00, 0x04, 0x00
};
flatbuffers::Verifier verifier2((uint8_t *) flatbuf2, sizeof(flatbuf2));
TEST_EQ(VerifyDoubleVecBuffer(verifier2), true);
The buffer with an empty double vector in it but still without size prefix does not pass:
const unsigned char flatbuf3[] =
{
0x04, 0x00, 0x00, 0x00, 0xf4, 0xff, 0xff, 0xff, 0x04, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x06, 0x00, 0x08, 0x00, 0x04, 0x00
};
flatbuffers::Verifier verifier3((uint8_t *) flatbuf3, sizeof(flatbuf3));
TEST_EQ(VerifyDoubleVecBuffer(verifier3), true);
I also tried producing buffers with the same content using C++ to compare. I know it doesn't have to be the same binary representation for the same content and I don't know enough about flatbuffers to understand it all yet but nevertheless:
04 00 00 00 FC FF FF FF 04 00 04 00
- flatcc, doublevec empty no size prefix
08 00 00 00 04 00 04 00 04 00 00 00
- C++, doublevec empty no size prefix
Both the above verifies in C++ verifier.
04 00 00 00 F4 FF FF FF 04 00 00 00 00 00 00 00 06 00 08 00 04 00
- flatcc, doublevec 0 len a vector no size prefix
0C 00 00 00 00 00 06 00 08 00 04 00 06 00 00 00 04 00 00 00 00 00 00 00
- C++, doublevec 0 len a vector no size prefix
Only the second verifies in the C++ verifier and it's two bytes longer.. and considering that the doublevec empty no size prefix is 12 bytes both produced with flatcc and C++ it does not seem to me C++ zero pads to 8 bytes?
@mikkelfj while I'm far from sure about my findings above and not sure exactly what they mean it doesn't seem to confirm that the problem we see here with flatcc is restricted to the verifier.
The flatbuf3 buffer in your example has a size prefix of the vector which is 4 (at byte offset 8 in the buffer). That is followed by 4 zero bytes. If this is a byte vector it would work, but it is a double vector it makes no sense. The size is not a multiple of 8.
@mikkelfj flatbuf3 buffer was created with this flatcc code (test case in this PR), same as initially presented in this PR that you said was valid but that the verifier was to blame:
DoubleVec_start_as_root(B);
DoubleVec_a_create(B, 0, 0);
DoubleVec_end_as_root(B);
Are you saying we are back to conclude that there is indeed a bug with double vectors in flatcc?
Correction: the size field is not 4. 4 is the offset to the vector. So it is correct. The following zero bytes is the length field. In flatbuf3 the alignment is correct because the size starts at offset 12 and the empty vector payload starts at offset 16 which is a multiple of 8.
In flatbuf2 the offset to the vector to be "04 00 04 00" which is wrong. It should be "04 00 00 00" and it should be followed by 4 zeroes and then by a vtable. Someting is missing.
In your comment starting with "I also tried producing buffers", I don't understand the first dumps. The second dumps: The flatcc ( C ) and flatc ( C++ ) buffer, the buffers have the same content, but the vtable is stored before the table in C++. The results in more padding required which explains the longer length. In both cases the offset the vector is 4 and points to a size field with zeroes that are a followed by an offset that is a multiple of 8. So both look correct.
As to the first dumps, I'll need more time but it appears to have "04 00 04 00" similar to before.
I think the first dumps are garbage.
I don't have time trace this right now, but I doubt that you are listing the buffers correctly. Also, without proper format hex dumps as in your previous posts, it is very difficul to assess alignment.
Hmm ok.. strange.. I tried to be careful with the dumps but I had to find something to do it in C++ that's why the format isn't exactly the same.
One interesting fact though is that functionally this all appears to be working if I stay in flatcc land (correct roundtrip write/read and comparison) and avoiding the verifier - but causes alignment and possibly other memory issues when subjected to sanitizers. So whatever problem there is, it is likely systemic.
https://github.com/dvidelabs/flatcc/blob/master/doc/binary-format.md
might help
Some of the dumps are missing vtables like 06 00 08 00 04 00. That, and the strange 04 00 04 00 sequence is why I doubt the dumps.
There are two things that surprise me with 0x04, 0x00, 0x00, 0x00, 0xfc, 0xff, 0xff, 0xff, 0x04, 0x00, 0x04, 0x00
- The offset is a negative number, maybe it changed but AFAIK offset are defined as
u32
which is a waste because it is caped by2^31
soi32
makes more sense, but it is not compliant with the spec. - The vTable is broken. First
0x04, 0x00
of the vTable determine the size of the vTable, which is correct in this case, but actually shouldn't be. A vTable normally contains of following values first 2 bytes (vTable size), second 2 bytes (table data size) and then followed by 2 bytes relative offsets from table offset to the property values. In the buffer at hand there are only vTable size and kind table data size, 4 being the size of the offset to vTable itself. But I don't see the the relative offset to the property. I guess it can be inferred that as the vTable is just 4 bytes long, the relative property offsets are0
. But that's quite implicit. I would expect the0x00
to be part of the vTable and the padding which would have to be added in case of the odd number fo properties.
That sad in the C++ version: 0x08 0x00 0x00 0x00 0x04 0x00 0x04 0x00 0x04 0x00 0x00 0x00
The vTable is also "cropped" but the vTable offset is a positive number as I would expect.
@mzaks The offsets to the vtables are soffset_t which is a signed 32-bit value. The value is subtracted from the location where the offset is stored. Since C++ generally stores vtables before the tables that use them, the soffsets are typically small positive numbers. FlatCC stores vtables clustered together last in the buffer, so it stores negative values at table start.
04 00 00 00 F4 FF FF FF 04 00 00 00 00 00 00 00 06 00 08 00 04 00
- flatcc, doublevec 0 len a vector no size prefix
Makes sense to me even though there is again the case of a negative offset, the table itself is:
F4 FF FF FF 04 00 00 00
which is vTable offset + vector offset. the central 00 00 00 00
is the size of the vector. Then it says that vTabel is of size 6, the size of the table data is 8 and the relative offset to the first property is 4. Which is all correct values.
0C 00 00 00 00 00 06 00 08 00 04 00 06 00 00 00 04 00 00 00 00 00 00 00
- C++, doublevec 0 len a vector no size prefix
is as far as I can tell correct.
First four bytes 0C 00 00 00
is offset to table data. The table data is 06 00 00 00 04 00 00 00
. First value 6 is the offset to vTable, which points us to 06 00 08 00 04
the second value 4 is the relative pointer to the vector which contains only of 00 00 00 00 00
because the size is 0. 00 00
between offset to table and vTable data are padding.
@mzaks
The vTable is broken. First 0x04, 0x00 of the vTable determine the size of the vTable, which is correct in this case,
That is not the vtable. That part of the buffer is missing. The 04 00 04 00 is still a strange value.
Ah, @mikkelfj I think you are correct, sorry. It is i32
because of the vTable reuse feature. I forgot about it.
Your next analysis is correct from a cursory look. That is why I earlier said the second set of buffers appear correct, but the first do not make any sense.
Your next analysis is correct from a cursory look. That is why I earlier said the second set of buffers appear correct, but the first do not make any sense.
it makes sense if 04 00 04 00
is a vTable which does not provide the relative property offsets ;).
it makes sense if 04 00 04 00 is a vTable which does not provide the relative property offsets ;).
Yes it appears you are right.
@bjornharrtell that means the buffer dumps are probably correct. In that case they should also verify since the null tables cannot be misaligned as they are not present.
Well, I have double checked and I cannot get 04 00 00 00 F4 FF FF FF 04 00 00 00 00 00 00 00 06 00 08 00 04 00
- flatcc, doublevec 0 len a vector no size prefix to verify with the C++ verifier (as done in https://github.com/dvidelabs/flatcc/pull/213#issuecomment-913147207).
And then we have the size prefixed variants from the initial description:
doublevec table with size (with empty vector):
20: doublevec table with size:
20: 00000000 1a 00 00 00 08 00 00 00 00 00 00 00 f4 ff ff ff |................|
20: 00000010 04 00 00 00 00 00 00 00 06 00 08 00 04 00 |..............|
doublevec table with size (without vector):
1: doublevec table with size:
1: 00000000 0c 00 00 00 04 00 00 00 fc ff ff ff 04 00 04 00 |................|
The "doublevec table with size (with empty vector)" fails to verify:
const unsigned char flatbuf4[] =
{
0x1a, 0x00, 0x00, 0x00, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf4, 0xff, 0xff, 0xff,
0x04, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x06, 0x00, 0x08, 0x00, 0x04, 0x00
};
flatbuffers::Verifier verifier4((uint8_t *) flatbuf4, sizeof(flatbuf4));
TEST_EQ(VerifySizePrefixedDoubleVecBuffer(verifier4), true);
The "doublevec table with size (without vector):" verifies with the C++ verifier as follows:
const unsigned char flatbuf5[] =
{
0x0c, 0x00, 0x00, 0x00, 0x04, 0x00, 0x00, 0x00, 0xfc, 0xff, 0xff, 0xff, 0x04, 0x00, 0x04, 0x00
};
flatbuffers::Verifier verifier5((uint8_t *) flatbuf5, sizeof(flatbuf5));
TEST_EQ(VerifySizePrefixedDoubleVecBuffer(verifier5), true);
So, same C++ verifier results with size prefixed variants.
I'd like to know more about why the C++ verifier fails.
As to C++ verifier failing on "The buffer with an empty double vector in it but still without size prefix does not pass:"
0x04, 0x00, 0x00, 0x00, 0xf4, 0xff, 0xff, 0xff, 0x04, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x06, 0x00, 0x08, 0x00, 0x04, 0x00
I really cannot see what would be wrong with that buffer except that the verifier might be looking for an optional file identifier. Without more information it is hard to tell.
As for those with size prefixes, perhaps we should focus on the one without first, but still, if you can get an error message, assertion or breakpoint clue ...
Ohh
If C++ verifies alignment, then you cannot verify a buffer in a const char array. It will not be aligned. You need a uint64_t array or use alignas(8) attribute. The same goes for flatcc but since that would be a different process, it might get lucky and find aligned data.
Unfortunately the C++ verifier does not seem to have any means of error reporting unless I'm missing something and I need to refresh skills in gdb to provide any other leads.
About the const char array, if I memcpy it to a freshly allocated buffer would that suffice?
yes
Ok so back to the non size prefixed buffer with zero sized vector in it.. still do not verify (not sure I got the malloc / memcpy part right for alignment issues.. not sure how I otherwise can allocate aligned buffer in C++):
const unsigned char flatbuf3chars[] =
{
0x04, 0x00, 0x00, 0x00, 0xf4, 0xff, 0xff, 0xff, 0x04, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x06, 0x00, 0x08, 0x00, 0x04, 0x00
};
void *flatbuf3 = malloc(sizeof(flatbuf3chars));
memcpy(flatbuf3, flatbuf3chars, sizeof(flatbuf3chars));
flatbuffers::Verifier verifier3((uint8_t *) flatbuf3, sizeof(flatbuf3chars));
TEST_EQ(VerifyDoubleVecBuffer(verifier3), true);