dlio_benchmark icon indicating copy to clipboard operation
dlio_benchmark copied to clipboard

generating indexed_binary files causes kernel OOM to kill process

Open krehm opened this issue 10 months ago • 0 comments

While testing other code, I tried to generate 168 indexed_binary sample files using a single dlio_benchmark process. As each file is created, the memory of the process grows until by the time it is creating file number 49 the process's memory has reached 240 GB and the kernel kills the process.

The memory growth occurs in method generate() in indexed_binary_generator.py. Since only a single process was used (comm_size == 1) the else clause in that routine is what produces the sample files.

This statement causes the memory problem:

binary_data = struct.pack(myfmt, *records[:data_to_write])

I can add print statements before it that print including for file #49, but a print statement after it does not print when the process is killed. Googling, I found that 'struct' caches data. I couldn't find documentation on the caching policy, when or if evictions are ever done, but there is a function

struct._clearcache()

which, if called immediately after the binary_data has been written to data_file, releases the cache memory and the size of the process then stays reasonably constant as all 168 files are created.

krehm avatar Apr 08 '24 14:04 krehm