tinyexpr icon indicating copy to clipboard operation
tinyexpr copied to clipboard

Handle malloc failures

Open mortie opened this issue 5 years ago • 7 comments

This PR adds handling for malloc failures. I don't quite know how one would add sane test cases to cover it, but I did test it manually by replaceing malloc with a malloc implementation which returns NULL 10% of the time, and all of the existing test cases run fine, with no memory errors or memory leaks. The only problem was one place in smoke.c which uses the return value of te_interp without checking for errors.

The approach to error checking is the most basic one; just add a whole bunch of if (ret == NULL) { ...; return NULL; } everywhere. It's not the prettiest, but it works. The alternative would be to setjmp at the beginning of te_compile, then longjmp out of new_expr. It would probably be faster, but setjmp and longjmp is honestly really scary, with a lot of space for accidentally stepping into undefined behavior (for example, all variables which are modified between the setjmp and the longjmp must be volatile). A longjmp-based approach would probably be faster though.


The reason behind doing this isn't just that I think malloc failures should be handled in general. I also think this is a possible approach for custom allocators such as allocators which allocate from a pool of static memory. Here's how I imagine a viable path to statically allocated tinyexpr:

  • Once malloc failures are handled gracefully, add TE_MALLOC and TE_FREE macros which default to malloc and free but can be overwritten at compile time.
  • I can then add -DTE_MALLOC=static_te_malloc -DTE_FREE=static_te_free to my tinyexpr.c compile options.
  • I can then define those functions like this in my own code:
static _Alignas(8) char buffer[1024];
static size_t allocs = 0;
static size_t offset = 0;

void *static_te_malloc(size_t count) {
    while (count % 8 != 0) count += 1; // Align
    if (offset + count >= sizeof(buffer)) {
        return NULL; // This will be handled properly by tinyexpr
    }

    void *ptr = buffer + offset;
    offset += count;
    allocs += 1;
    return ptr;
}

void static_te_free(void *ptr) {
    if (ptr == NULL) return;
    allocs -= 1;
    if (allocs == 0) {
        offset = 0; // All memory freed
    }
}

I personally find this to be a cleaner, simpler and more flexible solution than adding explicit support for using a static memory pool to tinyexpr itself.

I decided to not include those macros in this PR, because it's kind of orthogonal, and I think handling malloc failures has value regardless of whether or not you agree with my idea for supporting custom allocators.


Obviously, adding extra ifs everywhere has a performance cost (though only at te_compile time, not te_eval time). Here are times I got from running hyperfine ./bench.orig ./bench.new (where bench.new is the benchmarking suite with this patch):

Benchmark #1: ./bench.orig
  Time (mean ± σ):     17.170 s ±  0.178 s    [User: 17.164 s, System: 0.004 s]
  Range (min … max):   16.818 s … 17.485 s    10 runs

Benchmark #2: ./bench.new
  Time (mean ± σ):     18.342 s ±  1.088 s    [User: 18.341 s, System: 0.001 s]
  Range (min … max):   17.937 s … 21.431 s    10 runs

Summary
  './bench.orig' ran
    1.07 ± 0.06 times faster than './bench.new'

Here's the output from the unmodified benchmark suite on my machine:

martin@ubun ~/src/tinyexpr master $ ./bench.orig
Expression: a+5
native  5.0045e+11        260ms   384mfps
interp  5.0045e+11        621ms   161mfps
138.85% longer

Expression: 5+a+5
native  5.0095e+11        224ms   446mfps
interp  5.0095e+11        984ms   101mfps
339.29% longer

Expression: abs(a+5)
native  5.0045e+11        222ms   450mfps
interp  5.0045e+11        798ms   125mfps
259.46% longer

Expression: sqrt(a^1.5+a^2.5)
native  4.4443e+12       3323ms    30mfps
interp  4.4443e+12       4696ms    21mfps
41.32% longer

Expression: a+(5*2)
native  5.0095e+11        223ms   448mfps
interp  5.0095e+11        637ms   156mfps
185.65% longer

Expression: (a+5)*2
native  1.0009e+12        225ms   444mfps
interp  1.0009e+12        981ms   101mfps
336.00% longer

Expression: (1/(a+1)+2/(a+2)+3/(a+3))
native  5.2226e+05        297ms   336mfps
interp  5.2226e+05       3241ms    30mfps
991.25% longer

And the benchmark suite with this patch:

martin@ubun ~/src/tinyexpr master $ ./bench.new
Expression: a+5
native  5.0045e+11        259ms   386mfps
interp  5.0045e+11        698ms   143mfps
169.50% longer

Expression: 5+a+5
native  5.0095e+11        223ms   448mfps
interp  5.0095e+11       1169ms    85mfps
424.22% longer

Expression: abs(a+5)
native  5.0045e+11        223ms   448mfps
interp  5.0045e+11        892ms   112mfps
300.00% longer

Expression: sqrt(a^1.5+a^2.5)
native  4.4443e+12       3340ms    29mfps
interp  4.4443e+12       4671ms    21mfps
39.85% longer

Expression: a+(5*2)
native  5.0095e+11        224ms   446mfps
interp  5.0095e+11        758ms   131mfps
238.39% longer

Expression: (a+5)*2
native  1.0009e+12        223ms   448mfps
interp  1.0009e+12       1150ms    86mfps
415.70% longer

Expression: (1/(a+1)+2/(a+2)+3/(a+3))
native  5.2226e+05        327ms   305mfps
interp  5.2226e+05       3834ms    26mfps
1072.48% longer

mortie avatar Oct 13 '20 21:10 mortie

This is largely an alternative to https://github.com/codeplea/tinyexpr/pull/57. If someone who needs static memory (@stawiski? @RallyTronics?) could have a look at my custom allocators suggestion and see if it would work for them, that would be awesome.

mortie avatar Oct 13 '20 21:10 mortie

Good work, and really nice job on the write-up!

I like your approach to the memory checks - honestly this is how it should have been done from the start.

My only concern is that it adds a lot of lines of code. One of the selling points of this library is that it's small.

Would you consider replacing some/most of the checks with a macro?

I'm thinking

            if (insert == NULL) {
                te_free(p);
                te_free(ret);
                return NULL;
            }

could become something like

CHECK_NULL(insert, te_free(p); te_free(ret););

It might not work for all cases, but I think it could still make it a lot more concise.

codeplea avatar Oct 14 '20 04:10 codeplea

That's a good idea. It takes this patch from a +168/-7 to a +90/-7.

https://github.com/codeplea/tinyexpr/pull/68/files#diff-9dd905150ef73f001f71a847631eaf1c3c8cf53b7cb0ec921a200620fdf9cd04R375-R383 This was the only place I couldn't use the CHECK_NULL macro. However, if we changed new_expr to always 0-initialize all parameters up to arity, it could be replaced with CHECK_NULL(ret->parameters[i], te_free(ret)), because te_free would be able to handle it. What do you think?

mortie avatar Oct 14 '20 08:10 mortie

Actually, nevermind. I didn't realize new_expr already zero-initializes the entire expression, including the parameters. I force-pushed a version which uses CHECK_NULL in the ret->parameters[i] case too, turning this into a +83/-7.

mortie avatar Oct 14 '20 08:10 mortie

This approach can be interesting for applications allocation large blocks of data, but it is highly debatable whether there is a need for this kind of resource checking on such small amount of data we talk about here.

If malloc fails in this library, your system probably either already bogged down to a halt, and there is small chances to recover anyway. On my 16GB mem windows machine I can currently allocate 45 GB with malloc (either in one block or in 1GB blocks), and resrouce monitor tells me I already use 9GBs.

I personal feel these things should be managed by the OS in some way, and not by each and every small library.

tylov avatar Oct 23 '20 08:10 tylov

If malloc fails in this library, your system probably either already bogged down to a halt, and there is small chances to recover anyway.

That's true in the very limited case of desktop/server Linux systems with overcommit enabled. It's not true for embedded, not true for Linux without overcommit, not for some other operating systems, not even true for Linux with overcommit enabled but with ulimits in place.

Plus, even if it was 100% true in all cases, part of the reasoning for this patch is to support other allocation schemes such as allocating from one static buffer. It doesn't matter that malloc won't fail if you replace the allocator with one which allocates from a static 8k buffer. This is what the second part of the PR description was about.

What could be interesting though is an option for disabling checked malloc, if that's actually a need people have. One could add an option to replace the CHECK_NULL macro with something like an empty #define CHECK_NULL(...). I personally wouldn't consider the small code size and performance benefits to be worth it, but maybe other people have different considerations.

mortie avatar Oct 25 '20 11:10 mortie

I looked over this PR again, and the changes and overhead is minimal as you also claim. Thumbs up from me.

tylov avatar Oct 25 '20 21:10 tylov

@codeplea Could you merge this PR?

jussike avatar Sep 27 '22 12:09 jussike