6502bench "Collapse to external file"

I am not really sure I have thought this all through yet, so just see this as food for thought;

If you disassemble a big chunk of data, it will contain both code and data. If you coded this yourself, the segments of data would in many cases be inserted in the code from an external file, rather than being included in the primary file.

Let's say I have a bitmap picture, a game font, game level data or something like that, I guess I am seeking a mechanism for selecting that and defining it as a segment that should be exported to a separate file, and where the main assembler file would contain a reference.

While generating the assembler file, I am suggesting a feature saving out a selected chunk and having the assembler file contain something like .import binary "thebinaryfile.bin"

That was KickAssembler syntax, which I know isn't supported, but just as an example. For the c64, I guess we'd need the parameter to select between binary (plain raw data) and prg (which adds the load address as the two first byte as the file), in addition to be able to select the relevant filename.

If the data block was also collapsed, this would increase the overview of the file as a bigger share of the file would be the actual code.

This would help build a proper project from the reassembly, where editing the exported segments using external tools could also start.

The part where I surely haven't thought this all through is in combination with the visualizers. There are cases where "collapse to external file" would be relevant for non-graphical elements (like music and loader) but this could also be a relevant way to sort the request to collapse data covered by visualizers.

Feb 14 '23 11:02 BacchusFLT

I can think of two general ways to do something like this:

Import raw binary.
Import assembler code with bulk data instructions.

The limiting factor on (1) is likely assembler support. I'm not sure how many assemblers allow you to insert a fully-formed binary blob. Some assemblers might want to link binaries in rather than include them in. (2) is easier because it just requires some version of "#include", which is pretty standard.

For (1), the labels need to be in the base file. For (2), we can have a file with multiple blobs with individual labels. In fact, for (2) we can just generally put code in other files. OTOH, for (2) we have to consider nested includes.

There's a similar open item for 65816 code, for which having everything in one file is very awkward, but that's a broader issue (and you'd very much prefer linking over including in that case to help with label scope).

Feb 14 '23 16:02 fadden

I see the value of both the 1 and 2 options, so if both could be supported then you also have the full framework for the inclusion of external files also including source.

I am however mainly advocating option 1 here, as that would mean that the external file could be edited by an external editor. The whole point would be that an external editor would be able to access it in the binary format. My modus operandi as it is today is that I save the binary segments out using an emulator and then only feed 6502 bench with the segments containing code. I then need to manually create the inclusions. With the suggested feature I could import the entire memory dump, and for that sorting out what is code, what is data that I want in my main file, data I would want separately and segments that are uninitialized memory or junk that I don't care about.

The assemblers I found documentations to have the feature:

TASS has ".binary" ACME has "!binary" CA65 has ".incbin"

I have no idea how the Merlin works, so that would possibly be a restriction. I can't find how it should be done, and it would be sad if the least common denominator missing one of the options would prevent a beneficial feature.

Feb 14 '23 17:02 BacchusFLT

I appear to have underestimated support among assemblers. Thank you for doing the research.

We can work around non-support by just not separating the data if the assembler can't handle it. Some source generators already have situations (usually having to do with 65816 code) where they just dump raw hex.

Generating multiple output files is already supported (needed for ca65), so that's not an issue. We'd need to have the code generator output a label and an include statement in the generated source, then copy the binary data to a separate file, and skip forward.

The uncertain part is handling it in the user interface. We need a way to identify the region of bytes to copy to the file. One approach would be to make it a new data operand format. Since the region can't have labels or comments in it, and ideally it doesn't show up on-screen as anything more than a "500 bytes here" sort of marker, that might work. You can't straddle a label or long comment with a single operand format, so we prevent mid-region labels automatically. We need a place to store the filename; the SymbolRef field might work for that, since it's a weak reference and nothing will break if the symbol can't be found. Reverting the binarification could be done by changing the data operand.

Visualizers should still work, because they're based purely on file offset, and do not require a label.

Another way might be to define a new kind of label. When encountered, everything that follows, up to the next label, is thrown into the binary file. The filename could be tucked into the end-of-line comment.

Generated files currently have the form "name_assembler.ext". We might not need the "_assembler" part, since the binary is not particular to any given assembler. OTOH, if multiple assemblers are in use, it might help to keep all files associated with a given assembler in a nice group.

Feb 16 '23 04:02 fadden

Den tors 16 feb. 2023 kl 05:58 skrev Andy McFadden @.***

I would say data operand option would make sense - a new option named something like "External data", under the Bulk Data heading. (Again, that would be for the binary export - export to source would be something else, and if you want to treat them the same, then this would be less suitable).

The "startlabel" until "stoplabel" option is also fine by me.

The reference to "output a label", would that mean that all references to inside the block exported would be "label+offset"? (I would very much like that!)

Can I also iterate that collapsing the area would also be much appreciated. If that is a separate option, which could also apply to other areas or just this, I can't say. General is always better if it's possible.

Feb 16 '23 11:02 BacchusFLT

The reference to "output a label", would that mean that all references to inside the block exported would be "label+offset"? (I would very much like that!)

Yes. That would work the same way it does now.

Can I also iterate that collapsing the area would also be much appreciated. If that is a separate option, which could also apply to other areas or just this, I can't say. General is always better if it's possible.

Collapsing this type of section is easy if we do it as a distinct operand format. Presumably any data that's worth omitting from the assembly isn't all that interesting to look at. Anyone who wants to see what's actually there can double-click on the "bytes" column to view the file hex dump.

I'm not sure what to do with the HTML export, which is really just an slightly modified copy of what's shown on screen. Appending a hex dump of the imported file might make sense.

A more general show/hide mechanism would be best integrated into the main list UI. We could add a checkbox to the data operand editor that means, "only show the first line of multi-line items", but that's inconvenient to toggle, and doesn't help with something like a sprite formatted to have one graphic line per line of code (which looks nicer in the assembler, where you can't hide anything).

Feb 16 '23 23:02 fadden

Seems like you have this under control. My wishlist is now with Santa - All I can do is hope :)

Den fre 17 feb. 2023 00:01Andy McFadden @.***> skrev:

The reference to "output a label", would that mean that all references to inside the block exported would be "label+offset"? (I would very much like that!)

Yes. That would work the same way it does now.

Can I also iterate that collapsing the area would also be much appreciated. If that is a separate option, which could also apply to other areas or just this, I can't say. General is always better if it's possible.

Collapsing this type of section is easy if we do it as a distinct operand format. Presumably any data that's worth omitting from the assembly isn't all that interesting to look at. Anyone who wants to see what's actually there can double-click on the "bytes" column to view the file hex dump.

I'm not sure what to do with the HTML export, which is really just an slightly modified copy of what's shown on screen. Appending a hex dump of the imported file might make sense.

A more general show/hide mechanism would be best integrated into the main list UI. We could add a checkbox to the data operand editor that means, "only show the first line of multi-line items", but that's inconvenient to toggle, and doesn't help with something like a sprite formatted to have one graphic line per line of code (which looks nicer in the assembler, where you can't hide anything).

— Reply to this email directly, view it on GitHub https://github.com/fadden/6502bench/issues/144#issuecomment-1433852892, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGZWZSVHFJYFCF5UAYIP7STWX2WS5ANCNFSM6AAAAAAU3OJPIE . You are receiving this because you authored the thread.Message ID: @.***>

Feb 17 '23 00:02 BacchusFLT

Added to "TO-DO" list.

Feb 18 '23 05:02 fadden

Available in https://github.com/fadden/6502bench/releases/tag/v1.9.0-dev2

In brief:

Works just like the other "bulk data" formatters.
Each binary include must have a unique filename.
Files can be stored in subdirectories (e.g. "sounds/stuff.bin"), but can't ascend to a parent of the project directory.
The binary files are generated at the same time as the assembly sources. Existing files with the same names will be overwritten if and only if they have the same length. This is a safety measure to avoid inadvertently overwriting other files.

Jun 01 '24 21:06 fadden