Improve Address Table Formatter

Open fadden opened this issue 1 year ago • 0 comments

(There have been a number of requests for enhancements to the address table formatter. I'm collecting them here.)

The "Format Address Table" feature formats tables of addresses. These can come in a variety of forms, such as a list of 16-bit addresses:

jmptab   .dd2    func1
         .dd2    func2
         .dd2    func3

8-bit software often splits the high and low bytes into parallel arrays, like this:

jmptabl  .dd1    <func1
         .dd1    <func2
         .dd1    <func3
jmptabh  .dd1    >func1
         .dd1    >func2
         .dd1    >func3

The purpose of the formatter is to format the bytes as symbolic operands. If the referenced address has a label, we use that. If it doesn't, a new label is generated ("Txxxx").

Tables can be 8-bit, 16-bit words, 16-bit split, 24-bit words, 24-bit split. The high byte in a 16-bit or 24-bit address can be set to a fixed value. When the table is split into low/high/bank parts, the various pieces don't have to be adjacent in memory. They do need to be the same size, however.

For jump tables, it can also mark the target addresses as code entry points. Because it's common to load the values, push them on the stack, and RTS, the target address may need to be adjusted by one byte.

The existing implementation does not create an entity that defines the location and structure of the table. Instead, it performs a one-time bulk format of the various items as a single operation. It doesn't do anything that couldn't be done manually; it just does it faster. The output shown in the format dialog is just a preview, to make it easier to tell if the parameters are correct.

Areas for improvement:

Look for matches outside the project (issue #130). Match against project/platform symbols, and generate symbolic references to them when the table is generated. (This has come up more than once, e.g. issue #149.) [Done, in v1.9.0-dev3]
Automatically generate project symbols for external references ("ETxxxx") (issue #130). Note that project symbol renames are currently not refactoring renames.
Show a warning when the labels we're about to generate would end up hidden inside a multi-byte data area.
Allow specification of a fixed offset, for the benefit of address tables that are stored as offset tables (issue #143).
For split tables, format addresses with offsets based on the whole symbol, rather than part of the symbols address (issue #130). A reference to MyLabel+280 should be <(MyLabel+280) and >(MyLabel+280), not <MyLabel+24 and >MyLabel+1.
Make the setting of code start points more dynamic. Remove stale table-generated code start points when a table is reformatted (issue #154).
Format split tables as if the operands were references to a 16-bit value, rather than a pair of 8-bit values. (See https://github.com/fadden/6502bench/issues/130#issuecomment-1120434890 .) This is hard because the bytes are formatted individually, so a reference to $0560 in a 1024-byte region that starts at $0400 will be formatted as <(SYM+$01) and >(SYM+$60) rather than <(SYM+$0160) and >(SYM+$0160). (A minor change can make that <(SYM+$0100), but the low byte reference can't be fixed so easily. We may need to add a "full value" field to the data formatter entry.)

See also issue #141 (reference to relocated code).

Different Approach

Most of the requests can be satisfied with the existing format-and-forget approach. However, there may be value in switching to a mechanism that makes address tables an actual object that is re-evaluated every time the analyzer runs.

This came up here and in some other discussions. It offers two advantages:

You can undo things by modifying the table. The target labels and code start points are set by the table, not fixed in place. Standard "Lxxxx" auto-labels could be used instead of the "Txxxx" form. Project symbols could be generated that would go away if the table changed.
More context may be available at formatting time.

The trouble with this approach is that, if you create an object in the project, you will want a way to edit it. Specifically, a way to override the behavior for specific entries. Sometimes tables have junk in them, e.g. if valid inputs are the letters "abef", the table might have entries for all of a,b,c,d,e,f, with junk values for c/d since they can never be used. Without a way to flag those as junk, the disassembler might try to chase some bad references.

Larger changes, like expanding or contracting the table, might need to be handled by deleting and regenerating the table.

We also need to figure out how to store the table and how to handle conflicts. It probably needs to be a high-level object, rather than a format descriptor, because it potentially spans a non-contiguous range of bytes. Conflict resolution is mostly a matter of deciding whether the table gets handled before the formatting associated with individual offsets. We'd want to disallow creation of overlapping address table objects, and reject overlaps found in the project file.

Jun 01 '24 20:06 fadden