eRCaGuy_TimerCounter
eRCaGuy_TimerCounter copied to clipboard
Optimize access to _overflow_count from the ISR
Making this counter a static (rather than per-instance) class member allows the compiler to access it using direct (instead of indirect) load/store instructions. This saves the instructions needed to save, set and restore a pointer register, and makes the ISR shorter by 6 instructions and 10 CPU cycles. Even though it is not a huge improvement, a small speed-up can count within an ISR.
Note that TIMER2_OVF_vect() had to be declared a “friend” method of the class in order to be able to access the counter, which is private. An alternative would be to make the counter either a public class member or a “static” (as in “static linkage”) global variable of eRCaGuy_Timer2_Counter.cpp.
The gain in instructions and CPU cycles was determined by disassembling the provided example display_time_elapsed, compiled for an Uno with Arduino 1.8.13.
I'll definitely take a look at this when I get around to refactoring the whole library. See also: https://github.com/ElectricRCAircraftGuy/eRCaGuy_TimerCounter/pull/8#issuecomment-665187087.
Disassembling is something I don't know much about. I'd like to learn how to do it. Are you saying you simply are looking at the intermediate *.s file obtained with something like gcc -save-temps=obj foo.c -o ./bin/foo? How do you disassemble? I have very little experience looking at assembly as well, and it is totally unique to each hardware processor architecture, no?
Are you saying you simply are looking at the intermediate *.s file
Well, no, I am rather disassembling the compiled binary with something like:
avr-objdump --source --demangle test.elf > test.lss
To be fair, I just type make disasm, and Sudar Muthu's Arduino Makefile does the rest.
Looking at the intermediate assembly would be meaningless, as recent versions of the Arduino IDE (and, accordingly, the above Makefile) enable link time optimization. The compiler's intermediate representation is carried over all the way to the link phase. The object files typically contain both a throw-away, unoptimized machine-code version, and that intermediate representation. The optimization is performed on the intermediate representation of the whole program, at link time. Only then the machine code that will run on the MCU is actually generated.
[assembly] is totally unique to each hardware processor architecture, no?
It is. I confess AVR is the only one I know. I have been playing for years with the idea of learning x86 assembly, and have always been put down by its sheer complexity. The day I discovered AVR assembly, its simplicity was a revelation, and learning it was a joy. :-)