6502bench icon indicating copy to clipboard operation
6502bench copied to clipboard

Can't mark data as code

Open FungusN0 opened this issue 10 months ago • 6 comments

I'm disassembling some code that uses a jump table at the start of the code. It only recognizes the first jmp as valid code and the routine it points to. Everything after that is marked as data and there is no function such as "analyze code" or "mark as code".

FungusN0 avatar Feb 16 '25 02:02 FungusN0

Select the next piece that starts with $4c, right-click, and select "tag address as code start point" (or hit Ctrl+H Ctrl+C). Repeat.

The "repeat" part is sub-optimal. This was actually requested a while back (https://github.com/fadden/6502bench/issues/22) as a fully-automated recognition feature. It might be better as a special behavior of the tag feature, where you select the block of JMP instructions and a single "tag" operation properly handles the whole block.

fadden avatar Feb 16 '25 04:02 fadden

I agree that is suboptimal. Should be able to select blocks of anything and mark them as code, text, data, vectors, inline, and other common things in 6502. That would greatly speed up reversing things.

Some smaller intelligences could be made too, in terms of code that uses inline data after a JSR to a function that uses the return address+1 as a pointer to that data. This is tangential however.

FungusN0 avatar Feb 16 '25 06:02 FungusN0

Scott,

Glad I lured you over to 6502 bench :)

You can do tables as addresses with a -1, so that function is already there.

Den sön 16 feb. 2025 07:34FungusN0 @.***> skrev:

I agree that is suboptimal. Should be able to select blocks of anything and mark them as code, text, data, vectors, inline, and other common things in 6502. That would greatly speed up reversing things.

Some smaller intelligences could be made too, in terms of code that uses inline data after a JSR to a function that uses the return address+1 as a pointer to that data. This is tangential however.

— Reply to this email directly, view it on GitHub https://github.com/fadden/6502bench/issues/164#issuecomment-2661278737, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGZWZSWYYENCQCA3U2H4N732QAWQZAVCNFSM6AAAAABXHCE2NGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNRRGI3TQNZTG4 . You are receiving this because you are subscribed to this thread.Message ID: @.***> [image: FungusN0]FungusN0 left a comment (fadden/6502bench#164) https://github.com/fadden/6502bench/issues/164#issuecomment-2661278737

I agree that is suboptimal. Should be able to select blocks of anything and mark them as code, text, data, vectors, inline, and other common things in 6502. That would greatly speed up reversing things.

Some smaller intelligences could be made too, in terms of code that uses inline data after a JSR to a function that uses the return address+1 as a pointer to that data. This is tangential however.

— Reply to this email directly, view it on GitHub https://github.com/fadden/6502bench/issues/164#issuecomment-2661278737, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGZWZSWYYENCQCA3U2H4N732QAWQZAVCNFSM6AAAAABXHCE2NGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNRRGI3TQNZTG4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

BacchusFLT avatar Feb 16 '25 10:02 BacchusFLT

I agree that is suboptimal. Should be able to select blocks of anything and mark them as code, text, data, vectors, inline, and other common things in 6502. That would greatly speed up reversing things.

Marking ranges as various types of data (e.g. 16-bit address vectors), and as inline data, is supported. Code is more tricky than the others because you don't want to mark the entire block as code, but rather mark all code entry points. This is because SourceGen uses code tracing to find code areas, rather than simply "color coding" ranges. Also, sometimes you actually do need to mark multiple bytes of a single instruction as entry points, e.g. when a BIT instruction is used to wrap an immediate load.

Some smaller intelligences could be made too, in terms of code that uses inline data after a JSR to a function that uses the return address+1 as a pointer to that data. This is tangential however.

Inline data that follows a JSR can be formatted with extension scripts (https://6502bench.com/sgtutorial/extension-scripts.html). Some basic ones for addresses and strings are provided.

fadden avatar Feb 16 '25 15:02 fadden

Glad I lured you over to 6502 bench :)

Pontus,

Actually Grue did ;)

Marking ranges as various types of data (e.g. 16-bit address vectors), and as inline data, is supported. Code is more tricky than the others because you don't want to mark the entire block as code, but rather mark all code entry points. This is because SourceGen uses code tracing to find code areas, rather than simply "color coding" ranges. Also, sometimes you actually do need to mark multiple bytes of a single instruction as entry points, e.g. when a BIT instruction is used to wrap an immediate load.

OK, maybe the heuristics could be made a little smarter?

Some smaller intelligences could be made too, in terms of code that uses inline data after a JSR to a function that uses the return address+1 as a pointer to that data. This is tangential however.

Inline data that follows a JSR can be formatted with extension scripts (https://6502bench.com/sgtutorial/extension-scripts.html). Some basic ones for addresses and strings are provided.

That should maybe part of the tool since it's very very common thing to do, as are the others I mentioned. I view needing scripts as something for special cases like decryption or obfuscation that is program dependent.

FungusN0 avatar Feb 16 '25 20:02 FungusN0

Inline data that follows a JSR can be formatted with extension scripts (https://6502bench.com/sgtutorial/extension-scripts.html). Some basic ones for addresses and strings are provided.

That should maybe part of the tool since it's very very common thing to do, as are the others I mentioned. I view needing scripts as something for special cases like decryption or obfuscation that is program dependent.

My experience has been that inline data following a JSR is often fairly custom. The built-in script handles a lot of situations, but it's not uncommon to follow the JSR with a structure, like a two-byte text position before the string data. Apple ProDOS system calls are JSRs followed by a command code and an address, and it's useful to follow the address to format the parameter block there as well. I didn't want to have one mechanism for "simple" things and another for "complex" things.

fadden avatar Feb 17 '25 00:02 fadden

Improved tagging of JMP tables has been added (see #22).

fadden avatar Jul 11 '25 15:07 fadden