masm2c
masm2c copied to clipboard
Translator from assembly code of the program into C language
Hello, xor2003
I I leave the link below to complement it with your project.
Greetings.
Cicoparser https://github.com/gabonator/Education/tree/master/2021/CicoParser
CicoParser is a set of tools for conversion of IBM PC DOS applications into modern operating systems. Instead of emulation of the computer CPU, memory and peripherals, CicoParser translates assembly code of the program into C language and therefore achieves much higher performance than emulation.
Thanks it is very close project. Will investigate it carefully Lately I was able to reuse DOSBOX as hardware emulation library to reduce porting time to several days. For example I already ported Test drive 3...
I mean I can be sure the converted source works as original because threse is debug mode when I compare each instruction results when running under DOSBOX and converted.
I am glad to hear that xor2003.
DOSBox-X fork of the DOSBox project https://github.com/joncampbell123/dosbox-x DOSBox-X is a cross-platform DOS emulator based on the DOSBox project
DOSBox-X based IDA debugger https://github.com/lab313ru/dsbxida DoSBoXIDA project is based on Dosbox-X and it allows you to debug MS-DOS programs in IDA.
By the way: I spent tens of hours with WarCraft Orc and Humans, Dune 2: Building of a dynasty, Command and Conquer, TIM: The Incredible Machine, The Legend of Kyrandia, Monkey Island, ...
Yes, I tryed DOsbox IDA debuger too. It turns out with my tool not much need for DOS or Dosbox debuggers. We can use gdb or instrument the generated C++.
How converted instrumented debug execution works:
- Datasegements are checked before program starts: it compares what dosbox loaded from disk and what data was translated.
- The hardware emulation is provided by Dosbox
- Each translated instruction results are checked with what dosbox emulates. So if IDA was for example incorrectly pick wrong segment or wrongly converted data or my converter do some mistake we will see it immediately.
By converting IDA generated .lst file we make sure:
- the variable have names which assigned by IDA automatically or developer assigned using IDA GUI.
- The uknown values are left as is and data and code addressed are the same for translated code and data. So translated program behave we same way as when executed under Dosbox.
Cool games. What is your results with TIM?
I have not seen TIM yet. Another nice old school games: Heretic, X-COM, Alone In The Dark,100% vice! I really like your researches ;-) It is very amazing.
Look at there: https://github.com/M-HT/SR A project to statically recompile following games to create Windows or Linux (x86 or arm) versions of the games - Albion, X-Com: UFO Defense (UFO: Enemy Unknown), X-Com: Terror from the Deep, Warcraft: Orcs & Humans, Septerra Core: Legacy of the Creator, Battle Isle 3: Shadow of the Emperor.
Cheers.
Hi maximilien-noal,
I did not know about your application Spice86, you have just discovered it for me. https://github.com/OpenRakis/Spice86
Cheers.
Hello,
We are trying to reverse engineer the dos version of Cryo Dune.
The game is hand written in assembly and does a lot of "weird" things. For example:
- It rewrites its code dynamically
- It modifies the call stack to change the return addresses of callers, or to use RET as jump (no idea why they did that)
- Functions have several entry points and code can sometime jump in the middle of them or call them by jumping
- There are no calling conventions
I wonder whether masm2c or Cicoparser could translate this to C reliably, it would be interesting to test.
The approach I took initially was a bit different, basically I wrote an emulator (https://github.com/kevinferrare/spice86/) allowing you to replace functions or even assembly blocks with higher level language. Replacing the code bit by bit allows to always have a working program and to revert to emulated code to track down bugs.
I advanced a bit on Dune by translating code by hand, but this is very slow, so now with @maximilien-noal we are trying to have an hybrid between the 2 approaches, https://github.com/OpenRakis/Spice86 (with a change of language in the middle :) ).
I would be glad to discuss with you two, we have a discord channel here if you want to talk 16bit assembly 😄 https://discord.gg/vxA8TYpbGZ
- It rewrites its code dynamically - this one you will need to handle manually. masm2c might detect these points. I hope it is not much of this.
- It modifies the call stack to change the return addresses of callers, or to use RET as jump (no idea why they did that) - this one is handled by masm2c. If execution flow was terminated by ret or jmp operation the masm2c will automatically insert a label. All label and procedures are in jump dispatcher and global call dispatcher. Also if translated code organised in the single C++ funciton it is possible RET will be handled by jump dispatcher.
- Functions have several entry points and code can sometime jump in the middle of them or call them by jumping - this one is handled by masm2c. You will need to find procedures joining/merging mode and specify it as switch.
- There are no calling conventions - this is totaly fine with masm2c.