Unicorn on Windows takes 1GB of RAM when just instantiating an Emulator and registering a hook
Hello! So, I just create an instance and register the hook without actually mapping any memory or executing single opcode, and it's already +1gb of ram on Windows. ~~With rougly the same code in C it's only 11mb, so I think the problem could be somewhere in Rust code.~~ UPD: it's actually not related to Rust bindings.
use unicorn_engine::unicorn_const as ucc;
use unicorn_engine::Unicorn;
fn eat_1gb() {
let uni = Unicorn::new(ucc::Arch::X86, ucc::Mode::MODE_64);
if uni.is_err() {
println!("Unable to create unicorn instance");
return;
}
let mut emu = uni.unwrap();
let hook = emu.add_mem_hook(
ucc::HookType::MEM_UNMAPPED,
0,
u64::MAX,
|_uc, _access, _addr, _size, _value| {
true
},
);
std::thread::sleep(std::time::Duration::from_secs(1));
println!("1GB allocated");
emu.remove_hook(hook.unwrap()).unwrap();
}
fn main() {
for i in 0..30 {
eat_1gb();
println!("Iteration {}, check ram usage...", i);
// sleep for 1 second
std::thread::sleep(std::time::Duration::from_secs(1));
println!("1GB freed");
}
}
Latest version is used:
[dependencies]
unicorn-engine = "2.0.0"

Tested the C bindings and it reproduces with 2.0.0 (commit hash 6c1cbef6ac505d355033aef1176b684d02e1eb3a). It looks like there is a gigantic 1GB RWX page allocated.
Oh, sorry for that, actually not a bindings issue. Let me rename the issue then.
This is the TCG buffer. Look at qemu/accel/tcg/translate-all.c
Not sure if this is a real issue, because the memory is only allocated and not used (not sure how windows behaves in this case).
Yes, this is expected since it's the TCG buffer. On Windows, IIRC, the pages are allocated on demand. Meaning, even if you start several unicorn instances and allocate a few GB memory, your machine won't really run out of physical memory.
This is kind of true, but not exactly. You can reserve pages and then it’s guaranteed to not use memory.
This is kind of true, but not exactly. You can reserve pages and then it’s guaranteed to not use memory.
I haven't played with VritualAlloc for a very long time but we indeed MEM_RESERVE, which I think is pretty enough?
I’ll confirm, but that’s not what it looked like in Process Hacker…
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 15 days.
Not stale
I also ran into this and looked a bit into it, the assumption that Windows will only reserve and not allocate is not true, the flags passed to VirtualAlloc are MEM_RESERVE and MEM_COMMIT so that memory is definitely allocated. I ran into this issue as I wanted to emulate/simulate multiple threads by having multiple instances and having 32 threads means its eating 32 GiB. It might be a good idea to allow the user to specify the buffer size. I would be willing to contribute this change but I'm uncertain to what code I can modify safely without diverging too much from Qemu.
I also ran into this and looked a bit into it, the assumption that Windows will only reserve and not allocate is not true, the flags passed to VirtualAlloc are MEM_RESERVE and MEM_COMMIT so that memory is definitely allocated. I ran into this issue as I wanted to emulate/simulate multiple threads by having multiple instances and having 32 threads means its eating 32 GiB. It might be a good idea to allow the user to specify the buffer size. I would be willing to contribute this change but I'm uncertain to what code I can modify safely without diverging too much from Qemu.
If so, what's the correct flags here?
There isn’t really a flag that does this. You could basically MEM_RESERVE a range and then register a vectored exception handler that MEM_COMMITs the ranges that you access.
This obviously only works if you don’t do stuff like memset the whole range though…
There isn’t really a flag that does this. You could basically MEM_RESERVE a range and then register a vectored exception handler that MEM_COMMITs the ranges that you access.
This obviously only works if you don’t do stuff like memset the whole range though…
Oh I see, I could get a fix for that.
I got a fix for this, see this for some explanation and caveats.
With this fix, each instance will take 512KB of memory firstly and increase the memory usage on demand. I will remain this issue open until next release for possible feedback.
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 15 days.
Not stale 😊
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 15 days.
Still worth to keep it 😊
Still not receiving any more feedback, xd
There isn’t really a flag that does this. You could basically MEM_RESERVE a range and then register a vectored exception handler that MEM_COMMITs the ranges that you access.
This would be a solution. I don’t have much time to check the code recently, but let me know if you have any questions about how to do this…
There isn’t really a flag that does this. You could basically MEM_RESERVE a range and then register a vectored exception handler that MEM_COMMITs the ranges that you access.
This would be a solution. I don’t have much time to check the code recently, but let me know if you have any questions about how to do this…
I once tried this but finally gave up. IIRC, it’s due to the fact that we don’t have a good place to write the big try-catch.
You could use AddVectoredExceptionHandler to register an exception handler, something like this:
// TODO: these have to be set during initialization
char* jitSectionPtr;
ULONG_PTR jitSectionSize;
static LONG MyHandler(_EXCEPTION_POINTERS *ExceptionInfo) {
auto record = ExceptionInfo->ExceptionRecord;
if(record->ExceptionCode == EXCEPTION_ACCESS_VIOLATION) {
auto address = (char*)record->ExceptionInformation[1];
if(address >= jitSectionPtr && address < jitSectionPtr + jitSectionSize) {
// TODO: VirtualAlloc to commit the page
return EXCEPTION_CONTINUE_EXECUTION;
}
}
return EXCEPTION_CONTINUE_SEARCH;
}
void initialize() {
AddVectoredExceptionHandler(0, MyHandler);
}
On 64-bit targets you can reserve an arbitrary size, on 32-bit your address space is limited to 2/4GB so this solution wouldn't improve anything.
I see and I will have a look.
AddVectoredExceptionHandler
I finally recall why I give up on this approach - we need some mechanism to generate every handler for every unicorn instance, i.e., we need closures because we need to wrap every distance uc object, or we might commit other instance's memory wrongly.
A possible workaround is to share the same handler for all instances and commit the memory anyway but it might make things worse(?)
I would say that either you share the whole RWX section between all instances, in which case you can just commit on access when the memory is in the range.
Alternatively you would have a range per instance, so it’s a matter of saving them in a global and iterating all instances and check the range.
I would say that either you share the whole RWX section between all instances, in which case you can just commit on access when the memory is in the range.
Alternatively you would have a range per instance, so it’s a matter of saving them in a global and iterating all instances and check the range.
Both your solutions require a place to record the global information across all instances, which breaks a few our assumptions, especially some bindings do. Other solution is to get a simple closure implementation, either by introducing libffi which is ubiquitous or implementing a simple one. I will investigate a bit more and thanks for your help!
I don't see how this relates to the bindings. You cannot register an exception handler with state (eg closure). They are process-wide so if you want to use them you will need to store some global state to get back to the uc instance for that memory range. The alternative would be to properly implement this in qemu, but this is unlikely to be easier.
you will need to store some global state to get back to the uc instance
That's one of the way how closures work, no?
I implement the demand paging via seh handlers and naive closures trampoline here: https://github.com/unicorn-engine/unicorn/commit/3d5b2643f0af742d9b90b4511d0ee137775c8526#diff-842456abe9564ae1e7d75ab8f322be6c27ca3c512e445a18e5898dea68ad9799R872 Let's see how CI says though everything works on my machine.
Looking forward to your feedback!
All windows CI passed and this solution doesn't involve any bad hacks and thus I think this issue could be closed.
Ping me if there is any bug.