CommandStation-EX
CommandStation-EX copied to clipboard
Address flash far pointer issues for Mega
To set the ball rolling, I'll put something up to be critically examined and shot down as appropriate.
Pointers are used for access to data and/or functions. Pointers may be near (16-bit) or far (>16-bit). Although the Flash is arranged as 16-bit elements (the program instruction size), the access routines such as pgm_read_byte() accept a byte address, and return the appropriate byte from the 16-bit word. Data or functions in the first 64kb of flash may be accessed using near pointers. The ATmega2560 has more than 64Kb of flash so far pointers are needed if the data or function being accessed is above the 64kb boundary.
I've tried wading through the ObjDump output from CS and the PIO Inspect output, and have observed the following.
It looks like the PROGMEM declared arrays are all loaded into a contiguous area of memory near the beginning of the .text section. For example, I can see LOCO_ID_PROG
at 0x00041D, SSD1306AsciiWire::System5x7
at 0x000CE9, and port_to_mode_PGM
at 0x000FFA. Also, there are various symbols like DCC::ackManagerLoop()::__c
at 0x00052C, which I believe are strings defined as F("xxxx")
in the code. All in all, these symbols take up the area of FLASH from address 0x000304 to around 0x001182, where they are followed by __empty
, yield
, turnOffPWM
at 0x001184 and digitalRead
at 0x001289 which are executable code functions.
So the highest address currently occupied by F() or PROGMEM data appears to be 0x001182 which is well below the limit of 0x00FFFF for near address access. This amounts to 4745 (bytes) used out of 65536. It is 7.2% of the space accessible via near pointers, or 10.8% of the total flash usage in the CS code examined (43.9kb).
For program calls, there is an area labelled 000000e4 <__trampolines_start>:
which contains a list of jmp calls, e.g.
1e8: 0c 94 46 23 jmp 0x468c ; 0x468c <_ZN11DCCEXParser10callback_REi>
I think these are inserted by the compiler or linker so that near addresses can be used as function pointers. The near address points into the trampoline area. When a call is made to the near address, the jmp at that location 'bounces' the program to the correct function (using JMP instruction). Note that the JMP and CALL instructions use word addresses of 22 bits, so are able to cover the whole of the Flash memory.
So I think that there isn't currently an issue, and won't be until the size of PROGMEM, F("") strings and other low memory stuff (vectors, trampolines etc) exceeds 64kbytes (25% of total flash in the Mega). If (or when) this happens, then we may have to consider placing large flash arrays into another section of flash, above the 64kbyte boundary, and accessing it exclusively using far pointers.
This sounds quite plausible, does anyone have any better knowledge that can confirm or correct my analysis?
Some background on the 2560 instruction set.
LPM instruction (used by pgm_read_byte_near):. Loads one byte pointed to by the Z-register into the destination register Rd.
This instruction features a 100% space effective constant initialization or constant data fetch. The Program memory is organized in 16-bit words while the Z-pointer is a byte address. Thus, the least significant bit of the Z-pointer selects either low byte (ZLSB = 0) or high byte (ZLSB = 1). This instruction can address the first 64KB (32K words) of Program memory.
ELPM (used by pgm_read_byte_far): Loads one byte pointed to by the Z-register and the RAMPZ Register in the I/O space, and places this byte in the destination register Rd. This instruction features a 100% space effective constant initialization or constant data fetch. The Program memory is organized in 16-bit words while the Z-pointer is a byte address. Thus, the least significant bit of the Z-pointer selects either low byte (ZLSB = 0) or high byte (ZLSB = 1). This instruction can address the entire Program memory space. The Z-pointer Register can either be left unchanged by the operation, or it can be incremented. The incrementation applies to the entire 24-bit concatenation of the RAMPZ and Z-pointer Registers.