unicorn icon indicating copy to clipboard operation
unicorn copied to clipboard

Severe Performance Loss (450x) in Unicorn ARM64-to-ARM64 Emulation on Android?

Open Gavin0210 opened this issue 10 months ago • 7 comments

When using Unicorn Engine to emulate ARM64 code on an Android ARM64 device, the performance loss reaches 450x even with no hooks enabled. This seems abnormally high for same-architecture emulation. Is this normal? If not, what are the possible reasons? The code is extern "C" int64_t looptest(){ int64_t j=0; for (int64_t i=0;i<1000000;i++){ j+=1; } return j; }

Gavin0210 avatar Apr 21 '25 06:04 Gavin0210

The slowdown is expected but 450x sounds too much for me. How to reproduce?

wtdcode avatar Apr 21 '25 08:04 wtdcode

Native code executed the looptest natively and simulated the looptest via Unicorn, then output the time difference. According to the log, the time difference is 500x.

the unicorn is https://github.com/saicao/unicorn/tree/master patched by saicao last year. I build it myself to android. Because UC_HOOK_MEM_READ only triggered once#1908 in arm64 and saicao fix it

this Is log

2025-04-21 17:01:29.917 10195-10195 testuniLog              com.example.testunicorn              I  looptest addr is 7d8ef19930
2025-04-21 17:01:29.917 10195-10195 testuniLog              com.example.testunicorn              I  looptest 2697  loop count 1000000
2025-04-21 17:01:31.402 10195-10195 testuniLog              com.example.testunicorn              I  success
2025-04-21 17:01:31.402 10195-10195 testuniLog              com.example.testunicorn              I  emu time 1482573 

this code

extern "C"
int64_t looptest(){
    int64_t j=0;
    for (int64_t i=0;i<1000000;i++){
        j+=1;
    }
    return j;
}

void uc_emu(){
    uc_engine *uc=NULL;
    uc_err err = uc_open(UC_ARCH_ARM64, UC_MODE_ARM, &uc);
    if (err) {
        LOGI("Failed on uc_open() with error returned: %u (%s)\n", err,
             uc_strerror(err));
        return ;
    }

    uc_ctl_set_cpu_model(uc, UC_CPU_ARM64_A72);
    uc_mem_map(uc,(uint64_t )looptest&0xfffffffffffff000,4096,7);
    uc_mem_write(uc,(uint64_t )looptest,(void *)looptest,100);
    uint64_t sp_w=4096*2;
    uc_mem_map(uc,4096,4096*2,7);
    uc_reg_write(uc, UC_ARM64_REG_SP, &sp_w);
    struct timeval tv,tv2;
    gettimeofday(&tv, NULL);
    err = uc_emu_start(uc, (uint64_t)looptest, (uint64_t)looptest+0x50, 0, 0);
    gettimeofday(&tv2, NULL);
    if (err) {
        LOGI("Failed on uc_emu_start with error returned: %u (%s)\n", err,
             uc_strerror(err));
        return ;
    }else{
        LOGI("success");
        LOGI("emu time %ld ",(tv2.tv_sec-tv.tv_sec)*1000000+tv2.tv_usec-tv.tv_usec);
    }
}



extern "C" JNIEXPORT jstring
JNICALL
Java_com_example_testunicorn_MainActivity_speedtest(
        JNIEnv *env,
        jobject a2 /* this */) {
    struct timeval tv,tv2;
    gettimeofday(&tv, NULL);
    int64_t j= looptest();
    gettimeofday(&tv2, NULL);
    LOGI("looptest addr is %lx",looptest);
    LOGI("looptest %ld  loop count %ld",(tv2.tv_sec-tv.tv_sec)*1000000+tv2.tv_usec-tv.tv_usec,j);
    uc_emu();
    return env->NewStringUTF("");
} 

@wtdcode

Gavin0210 avatar Apr 21 '25 09:04 Gavin0210

saicao’s patch wasn’t correct. You should try current dev/master instead.

wtdcode avatar Apr 21 '25 09:04 wtdcode

saicao’s patch wasn’t correct. You should try current dev/master instead.

I try the dev branch. there is the new log. The performance loss reaches 450x

2025-04-21 17:59:15.087 10836-10836 testuniLog              com.example.testunicorn              I  looptest addr is 7d8c16ee40
2025-04-21 17:59:15.087 10836-10836 testuniLog              com.example.testunicorn              I  looptest 2915  loop count 1000000
2025-04-21 17:59:16.415 10836-10836 testuniLog              com.example.testunicorn              I  success
2025-04-21 17:59:16.415 10836-10836 testuniLog              com.example.testunicorn              I  emu time 1325992 

@wtdcode

Gavin0210 avatar Apr 21 '25 10:04 Gavin0210

    err = uc_emu_start(uc, (uint64_t)looptest, (uint64_t)looptest+0x50, 0, 0);

Are you sure this is correct? I doubt that the looptest function has exactly 0x50 bytes. A bit a better way to test would be something like this:

err = uc_emu_start(uc, (uint64_t)looptest, (uint64_t)uc_emu, 0, 0);

But this is still not complete correct, because looptest returns so this might jump to an stack address (which is not initialized). Also the compiler is not required to sort the functions in the order of the source file.

PhilippTakacs avatar Apr 22 '25 07:04 PhilippTakacs

    err = uc_emu_start(uc, (uint64_t)looptest, (uint64_t)looptest+0x50, 0, 0);

Are you sure this is correct? I doubt that the looptest function has exactly 0x50 bytes. A bit a better way to test would be something like this:

err = uc_emu_start(uc, (uint64_t)looptest, (uint64_t)uc_emu, 0, 0);

But this is still not complete correct, because looptest returns so this might jump to an stack address (which is not initialized). Also the compiler is not required to sort the functions in the order of the source file.

the ret of looptest exactly looptest+0x50 in ida

Gavin0210 avatar Apr 22 '25 08:04 Gavin0210

    err = uc_emu_start(uc, (uint64_t)looptest, (uint64_t)looptest+0x50, 0, 0);

Are you sure this is correct? I doubt that the looptest function has exactly 0x50 bytes. A bit a better way to test would be something like this:

err = uc_emu_start(uc, (uint64_t)looptest, (uint64_t)uc_emu, 0, 0);

But this is still not complete correct, because looptest returns so this might jump to an stack address (which is not initialized). Also the compiler is not required to sort the functions in the order of the source file.

I sometimes used this approach for very quick testing. By using the flags like -O3, gcc will tend to fully use registers in this case. But it is not portable for sure and I'm afraid UB.

wtdcode avatar Apr 22 '25 08:04 wtdcode