Zero page allocation for size
The current whole-program zero page allocator allocates only for size, not for speed. However, a (difficult to reproduce) real project by juj showed that the difference between zero page allocations could amount to kilobytes. Accordingly, in -Os, we should provide a balanced accounting of size and speed, and in -Oz, we should prefer size to speed.
Awesome work in investigating, happy to read that the issue was finally reproducible.
Here is that test case preserved for posterity: attribute_leaf_test_case.zip
I'll add that this is not a blocking issue of any kind, this just arose from an observation I had that adding __attribute__((leaf)) brought larger code size, when the expectation was that it shouldn't have made a difference, so wanted to help dig into that observation (and not leave it behind).
In my project I simply still omit the __attribute__((leaf)) bits to have the smaller code size, and that works well for the purpose of this project.