The problem that the reader.read_thread parameter does not utilize as much as the actual number of nproc.
My CPU's thread count is 512 (I checked through the nproc command), : 512 = 128core x 2 sockets x2(HyperThreading on) It seems that this tool does not utilize as much as the actual number of nproc. As you can see in the capture below, reader.read_threads parameter can only be allocated up to 128.
I checked again, and I think it only sweeps up to 128 threads in any environment. I don't know if it's a bottleleneck problem due to DLIO. However, according to the run parameter guide you told me in #202, I think the total number of cores from the nproc command should be allocated to read_thread.
Could you check this problem?
@xdreamcoder The traceback you've posted here hints that you've exceeded the maximum number of files your host can open concurrently (OSERROR - Too many open files). This isn't inherently a problem with the benchmark. The reader_threads parameter is scoped per accelerator, so your command is spawning 512 threads per accelerator attempting to be simulated which is overkill (and beyond what OS can do per error given). You should only need up to 32 threads per accelerator (even that may be overkill depending on your CPU).