tinymembench icon indicating copy to clipboard operation
tinymembench copied to clipboard

dual vs simple random read tests

Open Blimpyway opened this issue 6 years ago • 2 comments

Hi,

I do not understand the short message stating

Note 2: Dual random read means that we are simultaneously performing == == two independent memory accesses at a time. In the case if == == the memory subsystem can't handle multiple outstanding == == requests, dual random read has the same timings as two == == single reads performed one after another. ==

Why the memory subsystem does not optimize two consecutive "simple" reads on random_read_test() while dual random read call gives the same latency for two readings together instead of one?

Just because zerobuffer[] reads from v1 and v2 indexes are lined one next each other? (at lines 342 and 343 in main.c)

I'm asking this since on a i5 laptop random read latency is only slightly (15%) better than dual random read latency.

Thanks

Blimpyway avatar Nov 05 '18 23:11 Blimpyway

The memory subsystem can't optimize two consecutive "simple" reads because the address used by the second read is calculated from the value obtained from the first read. So the second read can't start before the first read is completed.

And the latency difference between these two methods is exactly what the test is trying to measure. Here is an example of a primitive processor which can't handle multiple outstanding requests: https://github.com/ssvb/tinymembench/wiki/Samsung-N220-(Intel-Atom-N450)

Your i5 processor is doing just fine.

ssvb avatar Nov 08 '18 04:11 ssvb

Ok, that makes sense. I changed the second array index to depend on input from first and it can't optimize anymore.

Thanks.

On 11/8/18, Siarhei Siamashka [email protected] wrote:

The memory subsystem can't optimize two consecutive "simple" reads because the address used by the second read is calculated from the value obtained from the first read. So the second read can't start before the first read is completely done.

And the latency difference between these two methods is exactly what the test is trying to measure. Here is an example of a primitive processor which can't handle multiple outstanding requests: https://github.com/ssvb/tinymembench/wiki/Samsung-N220-(Intel-Atom-N450)

Your i5 processor is doing just fine.

-- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/ssvb/tinymembench/issues/19#issuecomment-436872633

Blimpyway avatar Nov 08 '18 11:11 Blimpyway