lgt8fx icon indicating copy to clipboard operation
lgt8fx copied to clipboard

Reduce flash size

Open dbuezas opened this issue 2 years ago • 15 comments

I did the experiment @LaZsolt and @nerdralph propose

Just replacing Timer3 initialisation frees up 16 bytes

before

sbi(TCCR3B, CS31);		// set timer 3 prescale factor to 64
sbi(TCCR3B, CS30);
sbi(TCCR3A, WGM30);		// put timer 3 in 8-bit phase correct pwm mode

after

TCCR3B = _BV(CS31) | _BV(CS30);		// set timer 3 prescale factor to 64
TCCR3A = _BV(WGM30);		// put timer 3 in 8-bit phase correct pwm mode

Modifying the other timers and the ADC prescaler code frees up a total of 78 bytes.

I assume we could also port the newer Wire & co libraries from the latest uno core, they seem to have become more efficient. I think @jg1uaa was trying that.

And of course making the EEPROM space configurable via menu would add another 2k

Originally posted by @dbuezas in https://github.com/dbuezas/lgt8fx/issues/33#issuecomment-711066580

Will take PRs if anybody wants to give it a shot :)

dbuezas avatar Jan 26 '23 21:01 dbuezas

1

I found that the main.cpp source code is a bit untidy because some CLOCK_SOURCE dependent code snippets can be found in several places. In the function void lgt8fx8x_init() and void lgt8fx8x_clk_src() It would be clearer after the reorganization into one function.

2

On the other hand, four bytes could be saved in all such cases https://github.com/dbuezas/lgt8fx/blob/960b49a7b09de89a8967c7cd7652884726617216/lgt8f/cores/lgt8f/main.cpp#L79-L82 if code replaced to this:

uint8_t xyz_temp = PMCR & 0x9f;
PMCR = 0x80;
PMCR = xyz_temp;

3

And one more. When main clock switched to external I think no need to wait after core switched. https://github.com/dbuezas/lgt8fx/blob/960b49a7b09de89a8967c7cd7652884726617216/lgt8f/cores/lgt8f/main.cpp#L56-L70

There is also an opportunity to save here some bytes:

  uint8_t c = 1;
  do { for(GPIOR0 = 0xff; GPIOR0 > 0; --GPIOR0); } while (c--);

LaZsolt avatar Feb 28 '23 13:02 LaZsolt

Why does not using GPIOR0 save space?

Being a IO register, I think guarantees the compiler doesn't remove the code (b/c it is volatile). It may be a way to have a very predictable busy wait in the loops.

dbuezas avatar Feb 28 '23 17:02 dbuezas

I compiled an original and a recommended code. After it done, I turned the code to assembler source. C:\Program Files (x86)\Arduino\hardware\tools\avr\bin\avr-objdump.exe" -S test.elf > x.txt

Initial code snippet:

} else if(mode == INT_OSC) {
    // prescaler settings     - CLKPR addr 0x61
    CLKPR = 0x80;             // 80 e8         ldi r24, 0x80
                              // 80 93 61 00   sts 0x0061, r24
    CLKPR = 0x01;             // 91 e0         ldi r25, 0x01
                              // 90 93 61 00   sts 0x0061, r25

    // switch to int. crystal - PMCR addr 0xF2
    GPIOR0 = PMCR & 0x9f;     // 90 91 f2 00   lds r25, 0x00F2
                              // 9f 79         andi r25, 0x9F
                              // 9e bb         out 0x1e, r25
    PMCR = 0x80;              // 80 93 f2 00   sts 0x00F2, r24 - 0x80 is already in r24
    PMCR = GPIOR0;            // 9e b3         in r25, 0x1e
                              // 90 93 f2 00   sts 0x00F2, r25
                              - GPIOR0 addr 0x1e
    // disable ext. crystal
    GPIOR0 = PMCR & 0xfb;     // 90 91 f2 00   lds r25, 0x00F2
                              // 9b 7f         andi r25, 0xFB
                              // 9e bb         out 0x1e, r25
    PMCR = 0x80;              // 80 93 f2 00   sts 0x00F2, r24
    PMCR = GPIOR0;            // 9e b3         in r25, 0x1e
                              // 90 93 f2 00   sts 0x00F2, r25
}

Recommended code snippet:

} else if(mode == INT_OSC) {
    // prescaler settings     - CLKPR addr 0x61
    CLKPR = 0x80;             // 80 e8         ldi r24, 0x80
                              // 80 93 61 00   sts 0x0061, r24
    CLKPR = 0x01;             // 91 e0         ldi r25, 0x01
                              // 90 93 61 00   sts 0x0061, r25

    // switch to int. crystal - PMCR addr 0xF2
unint8_t temp_ = PMCR & 0x9f; // 90 91 f2 00   lds r25, 0x00F2
                              // 9f 79         andi r25, 0x9F
    PMCR = 0x80;              // 80 93 f2 00   sts 0x00F2, r24 - 0x80 is already in r24
    PMCR = temp_;             // 90 93 f2 00   sts 0x00F2, r25

    // disable ext. crystal
    temp_ = PMCR & 0xfb;      // 90 91 f2 00   lds r25, 0x00F2
                              // 9b 7f         andi r25, 0xFB
    PMCR = 0x80;              // 80 93 f2 00   sts 0x00F2, r24
    PMCR = temp_;             // 90 93 f2 00   sts 0x00F2, r25
}

LaZsolt avatar Feb 28 '23 18:02 LaZsolt

Oh, that makes sense now. If it's put in a variable, the compiler just keeps that at hand in a CPU register. GPIO0R is not a cpu register so extra operations are needed to move the value in and out.

Regarding the waits, I think I see what you are up to: using __asm__ __volatile__ ("nop\n\t"); inside a loop with a normal variable would also save space (maybe even two nop's or a uint16_t counter to avoid the having two loops, not sure what would take instructions)

dbuezas avatar Feb 28 '23 22:02 dbuezas

(maybe even two nop's or a uint16_t counter to avoid the having two loops, not sure what would take instructions)

Yes! This is the solution.

This code takes about 3566 cycles. (16 instructions, 32 bytes)

for(GPIOR0 = 0xff; GPIOR0 > 0; --GPIOR0);
for(GPIOR0 = 0xff; GPIOR0 > 0; --GPIOR0);

Tis replacement code takes the same amount of cycles. (4 instructions, 8 bytes)

_lgt8fx_delay_cycles(3566);

_lgt8fx_delay_cycles() is part of the delayMicroseconds() and is based on __builtin_avr_delay_cycles()

LaZsolt avatar Mar 01 '23 07:03 LaZsolt

Nice! another 24 bytes down! BTW, did you give the timer setup code a look? (the one in the first post in the issue)

dbuezas avatar Mar 01 '23 10:03 dbuezas

Replacing sbi with pure bit manipulation save a bunch of space too.

Yes, save a bunch of space. That was a good suggestion. It will be a lot of work to prepare an addition menu for the EEPROM's size selection.

LaZsolt avatar Mar 01 '23 12:03 LaZsolt

It will be a lot of work to prepare an addition menu for the EEPROM's size selection.

The menu side is not that bad, I made that for the clock originally (that was originally the main reason for this repo to exist).

dbuezas avatar Mar 01 '23 14:03 dbuezas

If you give me a constant name and values for each EEPROM size I can add the menu side. Code wise it is just a bunch of if macros

dbuezas avatar Mar 01 '23 14:03 dbuezas

Ok, I'm assuming that ot os enough to add variants to the boards.txt that change:

  • upload.maximum_data_size=xxx (currently only 2048)
  • upload.maximum_size=xxx (currently only 29696) Plus something inside main.cpp so the EEPROM library k ow how much space it had to set/clear without touching the program space.

Is there anything else that needs to be done?

dbuezas avatar Mar 01 '23 18:03 dbuezas

Few more byte savings when initializing Timer1.

before

#if defined(TCCR1B) && defined(CS11) && defined(CS10)
	TCCR1B = 0;

	// set timer 1 prescale factor to 64
	sbi(TCCR1B, CS11);
#if F_CPU >= 8000000L
	sbi(TCCR1B, CS10);
#endif
#elif defined(TCCR1) && defined(CS11) && defined(CS10)
	sbi(TCCR1, CS11);
#if F_CPU >= 8000000L
	sbi(TCCR1, CS10);
#endif
#endif

after

#if defined(TCCR1B) && defined(CS11) && defined(CS10)
    #if F_CPU >= 8000000L
        TCCR1B = _BV(CS11) | _BV(CS10);
    #else
        TCCR1B = _BV(CS11);
    #endif
#elif defined(TCCR1) && defined(CS11) && defined(CS10)
    #if F_CPU >= 8000000L
        TCCR1 = _BV(CS11) | _BV(CS10);
    #else
        TCCR1 = _BV(CS11);
    #endif
#endif

LaZsolt avatar Mar 01 '23 19:03 LaZsolt

Great! Should we get rid of all those #if defined ? we're only supporting the 328p variant, right?

dbuezas avatar Mar 01 '23 19:03 dbuezas

Should we get rid of all those #if defined ? we're only supporting the 328p variant, right?

We get rid of almost all of them, but we supporting 328d too, isn't it?

LaZsolt avatar Mar 01 '23 19:03 LaZsolt

I'll be away from the computer for a few days, so I'm not sure I'll be able to answer during that time.

LaZsolt avatar Mar 01 '23 19:03 LaZsolt

Add menu options to disable PWM functions to save FLASH space.

In lot of cases PWM functions not used. If someone doesn't want to use PWM capability with this menu will frees up additional FLASH space by not initializing the timers in wiring.c init() function.

In the same way, perhaps the ADC initialization could also be omitted.

LaZsolt avatar Apr 11 '23 08:04 LaZsolt