etl icon indicating copy to clipboard operation
etl copied to clipboard

Large compiled binary size due to inline expansion in ETL Library

Open Tagussan opened this issue 1 year ago • 20 comments

Hi,

I'm using the ETL library in my project and I've found that the compiled binary size is huge. Upon disassembling the object file, I noticed that functions from the ETL library are inline expanded everywhere. Specifically, operations involving etl::string and etl::vector are inline expanded and consume a lot of opcodes.

Is there any way to reduce or disable this level of inline expansion? This is becoming a problem in my project, as it takes up a substantial amount of Flash on my MCU.

Tagussan avatar Jun 23 '23 13:06 Tagussan

What compiler are you using? GCC has a -fno-inline compile option. There is also a -Os option (optimize for size) that may help.

Can you give me an example of a function that is being inlined?

jwellbelove avatar Jun 23 '23 14:06 jwellbelove

I'm using g++. I haven't tried that flags yet.

Here's an example. In the following assembly, we can see checking ASSERT, checking CAPACITY, some pointer operation, etc... and none of these are explicitly written in the original code.

ETL_EXPLICIT_STRING_FROM_CHAR string(const value_type* text)
      : istring(reinterpret_cast<value_type*>(&buffer), MAX_SIZE)
    {
      this->assign(text, text + etl::char_traits<value_type>::length(text));
      40:	4ab1      	ldr	r2, [pc, #708]	; (308 <SomeName::SomeName::SomeClass::setup()+0x308>)
      ETL_ASSERT(d >= 0, ETL_ERROR(string_iterator));
      42:	2900      	cmp	r1, #0
      44:	eb01 0502 	add.w	r5, r1, r2
      48:	f2c1 80f4 	blt.w	1234 <SomeName::SomeName::SomeClass::setup()+0x1234>
      while ((first != last) && (current_size != CAPACITY))
      4c:	4295      	cmp	r5, r2
      p_buffer[0] = 0;
      4e:	f88d 30b4 	strb.w	r3, [sp, #180]	; 0xb4
      while ((first != last) && (current_size != CAPACITY))
      52:	f001 80ed 	beq.w	1230 <SomeName::SomeName::SomeClass::setup()+0x1230>
      56:	3d01      	subs	r5, #1
      58:	4601      	mov	r1, r0
      5a:	24e6      	movs	r4, #230	; 0xe6
      5c:	e002      	b.n	64 <SomeName::SomeName::SomeClass::setup()+0x64>
        p_buffer[current_size++] = *first++;
      5e:	f812 4f01 	ldrb.w	r4, [r2, #1]!
      62:	992c      	ldr	r1, [sp, #176]	; 0xb0
      64:	1c58      	adds	r0, r3, #1
      while ((first != last) && (current_size != CAPACITY))
      66:	4295      	cmp	r5, r2
        p_buffer[current_size++] = *first++;
      68:	9029      	str	r0, [sp, #164]	; 0xa4
      6a:	54cc      	strb	r4, [r1, r3]
      while ((first != last) && (current_size != CAPACITY))
      6c:	f000 87a4 	beq.w	fb8 <SomeName::SomeName::SomeClass::setup()+0xfb8>
      70:	e9dd 3129 	ldrd	r3, r1, [sp, #164]	; 0xa4
      74:	428b      	cmp	r3, r1
      76:	d1f2      	bne.n	5e <SomeName::SomeName::SomeClass::setup()+0x5e>
      p_buffer[current_size] = 0;
      78:	9a2c      	ldr	r2, [sp, #176]	; 0xb0
      7a:	2100      	movs	r1, #0
      7c:	54d1      	strb	r1, [r2, r3]
      value ? data |= (pattern & MASK) : data &= (~pattern & MASK);
      7e:	f89d 30ac 	ldrb.w	r3, [sp, #172]	; 0xac
      82:	f043 0301 	orr.w	r3, r3, #1
      86:	f88d 30ac 	strb.w	r3, [sp, #172]	; 0xac
        this->x = _x;
      8a:	23a0      	movs	r3, #160	; 0xa0
      return &p_buffer[current_size];
      8c:	9e29      	ldr	r6, [sp, #164]	; 0xa4
      8e:	f8aa 30a2 	strh.w	r3, [sl, #162]	; 0xa2
      ETL_ASSERT(d >= 0, ETL_ERROR(string_iterator));
      92:	2e00      	cmp	r6, #0
        this->y = _y;
      94:	f04f 0305 	mov.w	r3, #5
      98:	f8aa 30a4 	strh.w	r3, [sl, #164]	; 0xa4
      9c:	f2c1 80ca 	blt.w	1234 <SomeName::SomeName::SomeClass::setup()+0x1234>
      if (is_secure())
      a0:	f89a 30bc 	ldrb.w	r3, [sl, #188]	; 0xbc
      current_size = 0U;
      a4:	2100      	movs	r1, #0
      return &p_buffer[0];
      a6:	9c2c      	ldr	r4, [sp, #176]	; 0xb0
      if (is_secure())
      a8:	0798      	lsls	r0, r3, #30
      current_size = 0U;
      aa:	f8ca 10b4 	str.w	r1, [sl, #180]	; 0xb4
      if (is_secure())
      ae:	d509      	bpl.n	c4 <SomeName::SomeName::SomeClass::setup()+0xc4>
        etl::memory_clear_range(&p_buffer[current_size], &p_buffer[CAPACITY]);
      b0:	f8da 00b8 	ldr.w	r0, [sl, #184]	; 0xb8
      b4:	f8da 30c0 	ldr.w	r3, [sl, #192]	; 0xc0
      b8:	b130      	cbz	r0, c8 <SomeName::SomeName::SomeClass::setup()+0xc8>
      ba:	4418      	add	r0, r3
      *p++ = 0;
      bc:	7019      	strb	r1, [r3, #0]
      be:	3301      	adds	r3, #1
    while (n--)
      c0:	4298      	cmp	r0, r3
      c2:	d1fb      	bne.n	bc <SomeName::SomeName::SomeClass::setup()+0xbc>
      p_buffer[0] = 0;
      c4:	f8da 30c0 	ldr.w	r3, [sl, #192]	; 0xc0
      c8:	2200      	movs	r2, #0
      return &p_buffer[current_size];
      ca:	4426      	add	r6, r4
      p_buffer[0] = 0;
      cc:	701a      	strb	r2, [r3, #0]
      while ((first != last) && (current_size != CAPACITY))
      ce:	42b4      	cmp	r4, r6
      d0:	f89a 30bc 	ldrb.w	r3, [sl, #188]	; 0xbc
      d4:	f023 0301 	bic.w	r3, r3, #1
      d8:	f88a 30bc 	strb.w	r3, [sl, #188]	; 0xbc
      dc:	f001 801b 	beq.w	1116 <SomeName::SomeName::SomeClass::setup()+0x1116>
      e0:	4622      	mov	r2, r4
      e2:	e00a      	b.n	fa <SomeName::SomeName::SomeClass::setup()+0xfa>
        p_buffer[current_size++] = *first++;
      e4:	f812 0b01 	ldrb.w	r0, [r2], #1
      e8:	1c5d      	adds	r5, r3, #1
      ea:	f8da 10c0 	ldr.w	r1, [sl, #192]	; 0xc0
      while ((first != last) && (current_size != CAPACITY))
      ee:	4296      	cmp	r6, r2
        p_buffer[current_size++] = *first++;
      f0:	f8ca 50b4 	str.w	r5, [sl, #180]	; 0xb4
      f4:	54c8      	strb	r0, [r1, r3]
      while ((first != last) && (current_size != CAPACITY))
      f6:	f001 800e 	beq.w	1116 <SomeName::SomeName::SomeClass::setup()+0x1116>
      fa:	e9da 312d 	ldrd	r3, r1, [sl, #180]	; 0xb4
      fe:	428b      	cmp	r3, r1
     100:	d1f0      	bne.n	e4 <SomeName::SomeName::SomeClass::setup()+0xe4>
      p_buffer[current_size] = 0;
     102:	f8da 20c0 	ldr.w	r2, [sl, #192]	; 0xc0
     106:	2100      	movs	r1, #0
     108:	54d1      	strb	r1, [r2, r3]
     10a:	f89a 30bc 	ldrb.w	r3, [sl, #188]	; 0xbc
     10e:	f043 0301 	orr.w	r3, r3, #1
     112:	f88a 30bc 	strb.w	r3, [sl, #188]	; 0xbc
      return (data & pattern) != value_type(0);
     116:	f89d 50ac 	ldrb.w	r5, [sp, #172]	; 0xac
      if (other.is_truncated())
     11a:	07ea      	lsls	r2, r5, #31
     11c:	d503      	bpl.n	126 <SomeName::SomeName::SomeClass::setup()+0x126>
      value ? data |= (pattern & MASK) : data &= (~pattern & MASK);
     11e:	f043 0301 	orr.w	r3, r3, #1
     122:	f88a 30bc 	strb.w	r3, [sl, #188]	; 0xbc
      if (other.is_secure())
     126:	f015 0502 	ands.w	r5, r5, #2
     12a:	f041 8010 	bne.w	114e <SomeName::SomeName::SomeClass::setup()+0x114e>
      if (is_secure())
     12e:	079f      	lsls	r7, r3, #30
     130:	f101 8011 	bmi.w	1156 <SomeName::SomeName::SomeClass::setup()+0x1156>

Tagussan avatar Jun 23 '23 14:06 Tagussan

ETL_EXPLICIT_STRING_FROM_CHAR string(const value_type* text) calls void assign(TIterator first, TIterator last) which ETL_ASSERTS that the iterated distance is >=0 (if ETL_IS_DEBUG_BUILD is true), initialises the string (clearing the buffer first if ETL_HAS_STRING_CLEAR_AFTER_USE is true and the secure flag is set), and then fills the buffer with the characters, whilst ensuring that the CAPACITY is not exceeded.

    template <typename TIterator>
    void assign(TIterator first, TIterator last)
    {
#if ETL_IS_DEBUG_BUILD
      difference_type d = etl::distance(first, last);
      ETL_ASSERT(d >= 0, ETL_ERROR(string_iterator));
#endif

      initialise();

      while ((first != last) && (current_size != CAPACITY))
      {
        p_buffer[current_size++] = *first++;
      }

      p_buffer[current_size] = 0;

#if ETL_HAS_STRING_TRUNCATION_CHECKS
      set_truncated(first != last);

#if ETL_HAS_ERROR_ON_STRING_TRUNCATION
      ETL_ASSERT(flags.test<IS_TRUNCATED>() == false, ETL_ERROR(string_truncation))
#endif
#endif
    }

Your compiler appears to be inlining the initialise() function, specifically etl::memory_clear, which just contains a simple while loop.

    while (n--)
    {
      *p++ = 0;
    }

jwellbelove avatar Jun 23 '23 14:06 jwellbelove

Or are you saying that the compiler inlines the string(const value_type* text) constructor at every place it's called?

jwellbelove avatar Jun 23 '23 15:06 jwellbelove

the compiler inlines the string(const value_type* text) constructor at every place it's called?

Yes, comparing the original code, that sounds true. I'm using -O2 flag for compilation

Tagussan avatar Jun 23 '23 15:06 Tagussan

Hi, i am also experiencing this issue. The linked ETL lib consumes somewhat around 100kB (which is nearly 80% of ROM) and i am only using some vector, map, delegate and variant (with 2 types).

2 years ago i also used ETL without any explicit linkage - rather by including - and i don't remember it being that consuming.

Also no-inline makes things even worse by 5kB. I am also compiling with -Os. When leaving out any optimisation, the build exceeds ROM by 13%.

quickshat avatar Jun 29 '23 16:06 quickshat

What error handling configuration are you using? Exceptions can add a ton of code. Also RTTI.

jwellbelove avatar Jun 29 '23 16:06 jwellbelove

I am using ETL_NO_CHECKS. And as far as i know, ETL only bloats due to RTTI with exceptions being enabled right ?

quickshat avatar Jun 29 '23 16:06 quickshat

In my experience RTTI and exceptions are independent of each other. I looked at using exceptions in a project once and the code size increased dramatically.

jwellbelove avatar Jun 29 '23 16:06 jwellbelove

Do you have some sample code that I can try with the embedded compilers I have installed on my machine?

jwellbelove avatar Jun 29 '23 16:06 jwellbelove

Do you have some sample code that I can try with the embedded compilers I have installed on my machine?

I've sent you my current state project - which builds successfully on my machine - as a download link via Contact form on the ETL website.

quickshat avatar Jun 29 '23 17:06 quickshat

Thanks. I'll take a look at the weekend as I'm away in London until Friday evening.

jwellbelove avatar Jun 29 '23 17:06 jwellbelove

I've brought the project into STM32CubeIDE, but the build fails due to not finding the #include "stm32f1xx_hal.h" for the drivers.

jwellbelove avatar Jul 01 '23 17:07 jwellbelove

You have to download the F1 Firmware package first with CUBEMX. I guess you never used a F1 before?

quickshat avatar Jul 01 '23 18:07 quickshat

I don't normally use STM32CudeIDE. I just have it installed for ETL cross platform compatibility and bug testing.

jwellbelove avatar Jul 01 '23 19:07 jwellbelove

I also dont use Cube IDE. I am using Cmake with CUBEMX standalone.

quickshat avatar Jul 01 '23 19:07 quickshat

Do you have any updates or should i try to bundle the STM32 F1 SDK ?

quickshat avatar Jul 03 '23 07:07 quickshat

I'm still looking at this when I can. I've been very busy with other work recently, but I'll do what I can.

jwellbelove avatar Jul 03 '23 07:07 jwellbelove

I've tried making a simple project in Keil, cutting out all of the hardware related calls to see what map file creates. The code sizes for the ivector member functions seems to be reasonably small.

    Exec Addr    Load Addr    Size         Type   Attr      Idx    E Section Name        Object
    0x08000fa4   0x08000fa4   0x0000001c   Code   RO         90    .text._ZN3etl7ivectorItE10initialiseEv  scase.o
    0x08000fc0   0x08000fc0   0x0000002a   Code   RO        142    .text._ZN3etl7ivectorItE11create_backEOt  scase.o
    0x08000fea   0x08000fea   0x00000002   PAD
    0x08000fec   0x08000fec   0x00000010   Code   RO         96    .text._ZN3etl7ivectorItE5clearEv  scase.o
    0x08000ffc   0x08000ffc   0x0000001e   Code   RO         38    .text._ZN3etl7ivectorItE9push_backEOt  scase.o
    0x0800101a   0x0800101a   0x00000002   PAD
    0x0800101c   0x0800101c   0x00000022   Code   RO         88    .text._ZN3etl7ivectorItEC2EPtj  scase.o
    0x0800103e   0x0800103e   0x00000002   PAD
    0x08001040   0x08001040   0x00000014   Code   RO         98    .text._ZN3etl7ivectorItED2Ev  scase.o

Which totals to 176 bytes.

jwellbelove avatar Jul 03 '23 12:07 jwellbelove

Ok that's a point. Actually you can track my last map file aswell. Should be in the cmake-debug-build folder i sent you. But i'll also have a closer look on that. Last time i checked with a visualizer tool due to the map files large size and i wasn't able to track down the issue.

quickshat avatar Jul 03 '23 13:07 quickshat