Performance slowdown on Windows
Hello,
I've cloned mimalloc repository and open mimalloc solution for Visual Studio 2022.
I've slightly modified sources of 'mimalloc-test-override' project to measure and compare performance with std:: operator new() in the following way:
- uncomment call of 'test_mt_shutdown();' from main() function
- change code of 'test_mt_shutdown' to:
static void test_mt_shutdown()
{
const int threads = 100;
//std::vector< std::future< std::vector< char* > > > ts;
auto fn = [&]()
{
int sz = 100000000000;
char* p = new char[sz];
//std::vector< char* > ps;
//ps.reserve(10000);
//for (int i = 0; i < 10000; i++)
//ps.emplace_back(new char[10000]);
for (int i = 0; i < sz; i++) {
p[i] = '0';
}
delete[] p;
};
for (int i = 0; i < threads; i++) {
auto v = std::async(std::launch::async, fn);
v.wait();
}
/*for (auto& f : ts)
for (auto& p : f.get())
delete[] p;*/
std::cout << "done" << std::endl;
}
and I got the following result:
done
heap stats: peak total current block total#
reserved: 35.1 GiB 35.1 GiB 35.1 GiB
committed: 34.1 GiB 34.1 GiB 34.1 GiB
reset: 0
purged: 4.3 MiB
touched: 192.7 KiB 2.0 MiB -33.9 GiB
segments: 3 32 1 not all freed
-abandoned: 0 0 0 ok
-cached: 0 0 0 ok
pages: 0 0 -103 not all freed
-abandoned: 0 0 0 ok
-extended: 0
-retire: 0
arenas: 1
-rollback: 0
mmaps: 174
commits: 86
resets: 0
purges: 69
guarded: 0
threads: 2 3 2 not all freed
searches: 0.0 avg
numa nodes: 2
elapsed: 89.913 s
process: user: 1.265 s, system: 21.390 s, faults: 8924915, rss: 21.3 GiB, commit: 35.3 GiB
After that, I've created sample Console C++ Application with the similar code below, which didn't use mimalloc:
#include <Windows.h>
#include <iostream>
#include <future>
int main()
{
DWORD flags = GetTickCount64();
std::cout << "tick=" << flags << std::endl;
const int threads = 100;
//std::vector< std::future< std::vector< char* > > > ts;
auto fn = [&]()
{
int sz = 100000000000;
char* p = new char[sz];
//std::vector< char* > ps;
//ps.reserve(10000);
//for (int i = 0; i < 10000; i++)
//ps.emplace_back(new char[10000]);
for (int i = 0; i < sz; i++) {
p[i] = '0';
}
delete[] p;
};
for (int i = 0; i < threads; i++) {
auto v = std::async(std::launch::async, fn);
v.wait();
}
DWORD flags2 = GetTickCount64();
std::cout << "tick=" << flags2 - flags << std::endl;
return 0;
}
and got the following result:
tick=571657031
tick=18390
According to my experiment, use of default std:: operator new / delete[] is much faster, than using of mimalloc - 89 seconds(mimalloc) vs 18 seconds (std::new). Why mimalloc with operator new shows bad results, however good result according stress-test written with C using calloc / realloc / free ?
Is there any ways to improve performance for operator new using this sample?
Tests performed on Windows 11.
Hi @nexan-pro -- thanks for the test. However, the test has various issues. For example the allocation size is 100000000000 (100 GiB) which means is is allocated by either allocator directly from the OS (and thus does not reflect the performance of the allocator). Moreover, you used int to store the size, but an int on Windows is always 32-bit, to it is trunctated and the actual allocation is 1215752192 (1.2 GiB) .. but that is still allocated directly from the OS. When I test with the latest dev, dev2 or dev3 it is about the same as with the Windows allocator (since the performance is dominated by the OS allocation). (also, GetTickCount64 returns a 64-bit integer as ULONGLONG and again the DWORD is 32-bit so can be cut off).
However, when I ran the test with the main branch it somehow was slower -- it turns out there was a bug there in the tracking of the allocation size of OS allocated blocks; I will try to update the main branch soon with the latest dev2; thanks! :-)
Thank you @daanx for feedback! Will waiting update in the main branch 🙈