rizin
rizin copied to clipboard
Refactor asm string building (`disasm.c`)
Is your feature request related to a problem? Please describe.
The asm string building functions are of low quality (performance wise and general code quality).
E.g. ds_build_op_str()
is 130 LOC, quite deeply nested, manipulate values which don't belong to it (core->parser
) and allocates and frees ds->opstr
at a minimum 4 times, if a number is replaced with a symbol name.
The whole building of an asm string (be it the colored or black & white version) should be refactored. It shouldn't work, at a minimum, on raw strings but on our asm-tokens. Direct string manipulations are error prone, because the string can be a colored or a b&w string. Making pattern searches more difficult than they need to be.
Describe the solution you'd like
A sensible route to follow could be:
- Implement tokenization for all architectures.
- Make a list of string substitutions
ds_build_op_str()
does on the string. - Implement a function for each of those, without doing them.
- Design a way where those manipulations are saved. (queue, in the asm tokens or something else?).
- Figure out what to do with conflicts, if two substitutions effect the same token.
- Assemble the final string.
Describe alternatives you've considered
It's pretty much open to discussion.
Additional context
filter.c
must probably be refactored in this process (https://github.com/rizinorg/rizin/issues/2766). There seems to be a heavy dependency on it.
I'll take this
@LukeTheEngineer have you had a chance to look into this?
@LukeTheEngineer have you had a chance to look into this?
Sorry, I've been busy having just graduated high school. In the meantime, I've been studying the code base to understand what does what do I can actually implement something feasible.