Severe Performance Loss (450x) in Unicorn ARM64-to-ARM64 Emulation on Android?
When using Unicorn Engine to emulate ARM64 code on an Android ARM64 device, the performance loss reaches 450x even with no hooks enabled. This seems abnormally high for same-architecture emulation.
Is this normal? If not, what are the possible reasons?
The code is
extern "C" int64_t looptest(){ int64_t j=0; for (int64_t i=0;i<1000000;i++){ j+=1; } return j; }
The slowdown is expected but 450x sounds too much for me. How to reproduce?
Native code executed the looptest natively and simulated the looptest via Unicorn, then output the time difference. According to the log, the time difference is 500x.
the unicorn is https://github.com/saicao/unicorn/tree/master patched by saicao last year. I build it myself to android. Because UC_HOOK_MEM_READ only triggered once#1908 in arm64 and saicao fix it
this Is log
2025-04-21 17:01:29.917 10195-10195 testuniLog com.example.testunicorn I looptest addr is 7d8ef19930
2025-04-21 17:01:29.917 10195-10195 testuniLog com.example.testunicorn I looptest 2697 loop count 1000000
2025-04-21 17:01:31.402 10195-10195 testuniLog com.example.testunicorn I success
2025-04-21 17:01:31.402 10195-10195 testuniLog com.example.testunicorn I emu time 1482573
this code
extern "C"
int64_t looptest(){
int64_t j=0;
for (int64_t i=0;i<1000000;i++){
j+=1;
}
return j;
}
void uc_emu(){
uc_engine *uc=NULL;
uc_err err = uc_open(UC_ARCH_ARM64, UC_MODE_ARM, &uc);
if (err) {
LOGI("Failed on uc_open() with error returned: %u (%s)\n", err,
uc_strerror(err));
return ;
}
uc_ctl_set_cpu_model(uc, UC_CPU_ARM64_A72);
uc_mem_map(uc,(uint64_t )looptest&0xfffffffffffff000,4096,7);
uc_mem_write(uc,(uint64_t )looptest,(void *)looptest,100);
uint64_t sp_w=4096*2;
uc_mem_map(uc,4096,4096*2,7);
uc_reg_write(uc, UC_ARM64_REG_SP, &sp_w);
struct timeval tv,tv2;
gettimeofday(&tv, NULL);
err = uc_emu_start(uc, (uint64_t)looptest, (uint64_t)looptest+0x50, 0, 0);
gettimeofday(&tv2, NULL);
if (err) {
LOGI("Failed on uc_emu_start with error returned: %u (%s)\n", err,
uc_strerror(err));
return ;
}else{
LOGI("success");
LOGI("emu time %ld ",(tv2.tv_sec-tv.tv_sec)*1000000+tv2.tv_usec-tv.tv_usec);
}
}
extern "C" JNIEXPORT jstring
JNICALL
Java_com_example_testunicorn_MainActivity_speedtest(
JNIEnv *env,
jobject a2 /* this */) {
struct timeval tv,tv2;
gettimeofday(&tv, NULL);
int64_t j= looptest();
gettimeofday(&tv2, NULL);
LOGI("looptest addr is %lx",looptest);
LOGI("looptest %ld loop count %ld",(tv2.tv_sec-tv.tv_sec)*1000000+tv2.tv_usec-tv.tv_usec,j);
uc_emu();
return env->NewStringUTF("");
}
@wtdcode
saicao’s patch wasn’t correct. You should try current dev/master instead.
saicao’s patch wasn’t correct. You should try current dev/master instead.
I try the dev branch. there is the new log. The performance loss reaches 450x
2025-04-21 17:59:15.087 10836-10836 testuniLog com.example.testunicorn I looptest addr is 7d8c16ee40
2025-04-21 17:59:15.087 10836-10836 testuniLog com.example.testunicorn I looptest 2915 loop count 1000000
2025-04-21 17:59:16.415 10836-10836 testuniLog com.example.testunicorn I success
2025-04-21 17:59:16.415 10836-10836 testuniLog com.example.testunicorn I emu time 1325992
@wtdcode
err = uc_emu_start(uc, (uint64_t)looptest, (uint64_t)looptest+0x50, 0, 0);
Are you sure this is correct? I doubt that the looptest function has exactly 0x50 bytes. A bit a better way to test would be something like this:
err = uc_emu_start(uc, (uint64_t)looptest, (uint64_t)uc_emu, 0, 0);
But this is still not complete correct, because looptest returns so this might jump to an stack address (which is not initialized). Also the compiler is not required to sort the functions in the order of the source file.
err = uc_emu_start(uc, (uint64_t)looptest, (uint64_t)looptest+0x50, 0, 0);Are you sure this is correct? I doubt that the
looptestfunction has exactly0x50bytes. A bit a better way to test would be something like this:err = uc_emu_start(uc, (uint64_t)looptest, (uint64_t)uc_emu, 0, 0);But this is still not complete correct, because
looptestreturns so this might jump to an stack address (which is not initialized). Also the compiler is not required to sort the functions in the order of the source file.
the ret of looptest exactly looptest+0x50 in ida
err = uc_emu_start(uc, (uint64_t)looptest, (uint64_t)looptest+0x50, 0, 0);Are you sure this is correct? I doubt that the
looptestfunction has exactly0x50bytes. A bit a better way to test would be something like this:err = uc_emu_start(uc, (uint64_t)looptest, (uint64_t)uc_emu, 0, 0);But this is still not complete correct, because
looptestreturns so this might jump to an stack address (which is not initialized). Also the compiler is not required to sort the functions in the order of the source file.
I sometimes used this approach for very quick testing. By using the flags like -O3, gcc will tend to fully use registers in this case. But it is not portable for sure and I'm afraid UB.