mpack icon indicating copy to clipboard operation
mpack copied to clipboard

builder page allocation leak and segfault

Open momentarylapse opened this issue 2 years ago • 6 comments

hi, I ran into a segfault with the following code:

mpack_writer_t writer;

char *output_data = nullptr;
size_t output_size = 0;
mpack_writer_init_growable(&writer, &output_data, &output_size);


const int N_OUTER = 10;
const int N_INNER = 100;

mpack_start_array(&writer, N_OUTER);
for (int k=0; k<N_OUTER; k++) {
	mpack_build_map(&writer);
	mpack_write_str(&writer, "key", 3);
	mpack_write_u64(&writer, k*k);
	mpack_write_str(&writer, "value", 5);
	mpack_start_array(&writer, N_INNER);
	for (int i=0; i<N_INNER; i++) {
		mpack_build_map(&writer);
		mpack_write_str(&writer, "value0", 6);
		mpack_write_i64(&writer, i);
		mpack_write_str(&writer, "value1", 6);
		mpack_write_i64(&writer, i*k);
		mpack_complete_map(&writer);
	}
	mpack_finish_array(&writer);
	mpack_complete_map(&writer); // <- segfault here
}
mpack_finish_array(&writer);


mpack_writer_destroy(&writer);

At the end of the 5th outer loop, the call to mpack_complete_map() will corrupt the internal memory of the builder, resulting in a segfault. Some mild debugging traced the issue back to the page allocation mechanism. According to the logs, we might be freeing one page and using the next without initializing it.

The problem vanishes when writing different integer values. It seems to be triggered only by very specific alignment conditions.

I would appreciate it ,if you could confirm the findings or give me a clue of any mismatching calls in my code. Thanks!

momentarylapse avatar Feb 09 '22 17:02 momentarylapse

Fixed, sorry about this. It's an unfortunate off-by-one bug which is triggered on the rare occasion that a build ends up exactly at the end of a 4096 page boundary. I'm a little surprised I never encountered this bug in my own projects; I guess I tend to keep my builds fairly small so my nested builds never roll over a page.

I'm having trouble crafting a unit test that will reproduce this. The tests currently set a very small page size to test various wrapping behaviors but this size makes it impossible for a build to be aligned at the end. I'll have to make the page size vary by unit test. I will keep this open until it's properly unit tested.

ludocode avatar Feb 14 '22 08:02 ludocode

Thanks for the very quick fix! can't we add the current example above to the unit tests as a regression test, at least until a more specific way to test this has been implemented?

pizzard avatar Feb 14 '22 12:02 pizzard

Your example doesn't trigger the bug in the unit test suite because the unit tests use small page sizes to test various wrapping behaviors. I need to change the unit test runner to make the page size configurable to properly test this. It also needs to be optimized and simplified. The unit test suite is run under Valgrind so unit tests need to be as fast as possible.

ludocode avatar Feb 15 '22 05:02 ludocode

Yes, that was quick. And now, everything works perfectly. Thanks a lot for your great work!

momentarylapse avatar Feb 16 '22 09:02 momentarylapse

I think I stumbled over this, using v1.1. I haven't confirmed it, but the problem seems very similar and it was fixed by switching to the develop branch. It would be great, if this could be fixed with a new stable release.

ollbx avatar Dec 05 '22 17:12 ollbx

Release v1.1.1 is out. Sorry for the delay. I'll leave this open because I still want to improve the unit tests for this.

ludocode avatar Jan 09 '23 03:01 ludocode