lgt8fx
lgt8fx copied to clipboard
Reduce flash size
I did the experiment @LaZsolt and @nerdralph propose
Just replacing Timer3 initialisation frees up 16 bytes
before
sbi(TCCR3B, CS31); // set timer 3 prescale factor to 64
sbi(TCCR3B, CS30);
sbi(TCCR3A, WGM30); // put timer 3 in 8-bit phase correct pwm mode
after
TCCR3B = _BV(CS31) | _BV(CS30); // set timer 3 prescale factor to 64
TCCR3A = _BV(WGM30); // put timer 3 in 8-bit phase correct pwm mode
Modifying the other timers and the ADC prescaler code frees up a total of 78 bytes.
I assume we could also port the newer Wire & co libraries from the latest uno core, they seem to have become more efficient. I think @jg1uaa was trying that.
And of course making the EEPROM space configurable via menu would add another 2k
Originally posted by @dbuezas in https://github.com/dbuezas/lgt8fx/issues/33#issuecomment-711066580
Will take PRs if anybody wants to give it a shot :)
1
I found that the main.cpp source code is a bit untidy because some CLOCK_SOURCE dependent code snippets can be found in several places. In the function void lgt8fx8x_init() and void lgt8fx8x_clk_src()
It would be clearer after the reorganization into one function.
2
On the other hand, four bytes could be saved in all such cases https://github.com/dbuezas/lgt8fx/blob/960b49a7b09de89a8967c7cd7652884726617216/lgt8f/cores/lgt8f/main.cpp#L79-L82 if code replaced to this:
uint8_t xyz_temp = PMCR & 0x9f;
PMCR = 0x80;
PMCR = xyz_temp;
3
And one more. When main clock switched to external I think no need to wait after core switched. https://github.com/dbuezas/lgt8fx/blob/960b49a7b09de89a8967c7cd7652884726617216/lgt8f/cores/lgt8f/main.cpp#L56-L70
There is also an opportunity to save here some bytes:
uint8_t c = 1;
do { for(GPIOR0 = 0xff; GPIOR0 > 0; --GPIOR0); } while (c--);
Why does not using GPIOR0 save space?
Being a IO register, I think guarantees the compiler doesn't remove the code (b/c it is volatile). It may be a way to have a very predictable busy wait in the loops.
I compiled an original and a recommended code. After it done, I turned the code to assembler source.
C:\Program Files (x86)\Arduino\hardware\tools\avr\bin\avr-objdump.exe" -S test.elf > x.txt
Initial code snippet:
} else if(mode == INT_OSC) {
// prescaler settings - CLKPR addr 0x61
CLKPR = 0x80; // 80 e8 ldi r24, 0x80
// 80 93 61 00 sts 0x0061, r24
CLKPR = 0x01; // 91 e0 ldi r25, 0x01
// 90 93 61 00 sts 0x0061, r25
// switch to int. crystal - PMCR addr 0xF2
GPIOR0 = PMCR & 0x9f; // 90 91 f2 00 lds r25, 0x00F2
// 9f 79 andi r25, 0x9F
// 9e bb out 0x1e, r25
PMCR = 0x80; // 80 93 f2 00 sts 0x00F2, r24 - 0x80 is already in r24
PMCR = GPIOR0; // 9e b3 in r25, 0x1e
// 90 93 f2 00 sts 0x00F2, r25
- GPIOR0 addr 0x1e
// disable ext. crystal
GPIOR0 = PMCR & 0xfb; // 90 91 f2 00 lds r25, 0x00F2
// 9b 7f andi r25, 0xFB
// 9e bb out 0x1e, r25
PMCR = 0x80; // 80 93 f2 00 sts 0x00F2, r24
PMCR = GPIOR0; // 9e b3 in r25, 0x1e
// 90 93 f2 00 sts 0x00F2, r25
}
Recommended code snippet:
} else if(mode == INT_OSC) {
// prescaler settings - CLKPR addr 0x61
CLKPR = 0x80; // 80 e8 ldi r24, 0x80
// 80 93 61 00 sts 0x0061, r24
CLKPR = 0x01; // 91 e0 ldi r25, 0x01
// 90 93 61 00 sts 0x0061, r25
// switch to int. crystal - PMCR addr 0xF2
unint8_t temp_ = PMCR & 0x9f; // 90 91 f2 00 lds r25, 0x00F2
// 9f 79 andi r25, 0x9F
PMCR = 0x80; // 80 93 f2 00 sts 0x00F2, r24 - 0x80 is already in r24
PMCR = temp_; // 90 93 f2 00 sts 0x00F2, r25
// disable ext. crystal
temp_ = PMCR & 0xfb; // 90 91 f2 00 lds r25, 0x00F2
// 9b 7f andi r25, 0xFB
PMCR = 0x80; // 80 93 f2 00 sts 0x00F2, r24
PMCR = temp_; // 90 93 f2 00 sts 0x00F2, r25
}
Oh, that makes sense now. If it's put in a variable, the compiler just keeps that at hand in a CPU register. GPIO0R is not a cpu register so extra operations are needed to move the value in and out.
Regarding the waits, I think I see what you are up to: using __asm__ __volatile__ ("nop\n\t"); inside a loop with a normal variable would also save space (maybe even two nop's or a uint16_t counter to avoid the having two loops, not sure what would take instructions)
(maybe even two nop's or a uint16_t counter to avoid the having two loops, not sure what would take instructions)
Yes! This is the solution.
This code takes about 3566 cycles. (16 instructions, 32 bytes)
for(GPIOR0 = 0xff; GPIOR0 > 0; --GPIOR0);
for(GPIOR0 = 0xff; GPIOR0 > 0; --GPIOR0);
Tis replacement code takes the same amount of cycles. (4 instructions, 8 bytes)
_lgt8fx_delay_cycles(3566);
_lgt8fx_delay_cycles() is part of the delayMicroseconds() and is based on __builtin_avr_delay_cycles()
Nice! another 24 bytes down! BTW, did you give the timer setup code a look? (the one in the first post in the issue)
Replacing sbi with pure bit manipulation save a bunch of space too.
Yes, save a bunch of space. That was a good suggestion. It will be a lot of work to prepare an addition menu for the EEPROM's size selection.
It will be a lot of work to prepare an addition menu for the EEPROM's size selection.
The menu side is not that bad, I made that for the clock originally (that was originally the main reason for this repo to exist).
If you give me a constant name and values for each EEPROM size I can add the menu side. Code wise it is just a bunch of if macros
Ok, I'm assuming that ot os enough to add variants to the boards.txt that change:
- upload.maximum_data_size=xxx (currently only 2048)
- upload.maximum_size=xxx (currently only 29696) Plus something inside main.cpp so the EEPROM library k ow how much space it had to set/clear without touching the program space.
Is there anything else that needs to be done?
Few more byte savings when initializing Timer1.
before
#if defined(TCCR1B) && defined(CS11) && defined(CS10)
TCCR1B = 0;
// set timer 1 prescale factor to 64
sbi(TCCR1B, CS11);
#if F_CPU >= 8000000L
sbi(TCCR1B, CS10);
#endif
#elif defined(TCCR1) && defined(CS11) && defined(CS10)
sbi(TCCR1, CS11);
#if F_CPU >= 8000000L
sbi(TCCR1, CS10);
#endif
#endif
after
#if defined(TCCR1B) && defined(CS11) && defined(CS10)
#if F_CPU >= 8000000L
TCCR1B = _BV(CS11) | _BV(CS10);
#else
TCCR1B = _BV(CS11);
#endif
#elif defined(TCCR1) && defined(CS11) && defined(CS10)
#if F_CPU >= 8000000L
TCCR1 = _BV(CS11) | _BV(CS10);
#else
TCCR1 = _BV(CS11);
#endif
#endif
Great!
Should we get rid of all those #if defined ? we're only supporting the 328p variant, right?
Should we get rid of all those
#if defined? we're only supporting the 328p variant, right?
We get rid of almost all of them, but we supporting 328d too, isn't it?
I'll be away from the computer for a few days, so I'm not sure I'll be able to answer during that time.
Add menu options to disable PWM functions to save FLASH space.
In lot of cases PWM functions not used. If someone doesn't want to use PWM capability with this menu will frees up additional FLASH space by not initializing the timers in wiring.c init() function.
In the same way, perhaps the ADC initialization could also be omitted.