timing(IPrefetch): add 1 cycle to s2_finish
Cut critical path prefetchPipe s2 -> toMSHRArbiter.valid(i) -> toMSHR.paddr -> missUnit hit -> missUnit.req.ready -> prefetchPipe toMSHRArbiter.ready -> s2_finish -> s2_ready -> s1_ready -> toFtq.ready for timing.
This can be thought of as adding 1 cycle to the prefetchPipe s2_finish, but only a minor performance change is expected, since the timing of issuing the first miss request is unchanged, and the additional waiting delay for subsequent miss requests can be hidden by the l2 cache access delay.
[Generated by IPC robot] commit: ca2c79f26008574c9a1e0f9aeb2363452ae7f8b8
| commit | astar | copy_and_run | coremark | gcc | gromacs | lbm | linux | mcf | microbench | milc | namd | povray | wrf | xalancbmk |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ca2c79f | 1.888 | 0.460 | 2.709 | 1.198 | 2.829 | 2.487 | 2.395 | 0.924 | 1.384 | 1.377 | 3.349 | 2.755 | 2.423 | 3.195 |
master branch:
| commit | astar | copy_and_run | coremark | gcc | gromacs | lbm | linux | mcf | microbench | milc | namd | povray | wrf | xalancbmk |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| b30cb8b | 0.460 | 2.695 | 2.401 | 0.919 | 1.379 | 2.751 | ||||||||
| a53daa0 | 0.460 | 2.695 | 1.186 | 2.401 | 0.919 | 1.379 | 1.454 | 3.362 | 2.751 | 3.212 | ||||
| 8b2f7ab | 1.865 | 0.460 | 2.695 | 1.186 | 2.822 | 2.490 | 2.401 | 0.919 | 1.379 | 1.454 | 3.362 | 2.751 | 2.418 | 3.212 |
| dd286b6 | 0.460 | 2.695 | 1.186 | 2.822 | 2.490 | 2.401 | 0.919 | 1.379 | 1.454 | 3.362 | 2.751 | 3.212 | ||
| e6f36bc | 1.855 | 0.460 | 2.695 | 1.186 | 2.822 | 2.490 | 2.401 | 0.919 | 1.379 | 1.454 | 3.362 | 2.751 | 2.418 | 3.212 |
| 3088616 | 1.855 | 0.460 | 2.695 | 1.186 | 2.822 | 2.490 | 2.401 | 0.919 | 1.379 | 1.454 | 3.362 | 2.751 | 2.418 | 3.212 |
| 497660c | 1.855 | 0.460 | 2.695 | 1.186 | 2.822 | 2.490 | 2.401 | 0.919 | 1.379 | 1.454 | 3.362 | 2.751 | 2.418 | 3.212 |
| 65e844f | 1.865 | 0.460 | 2.695 | 1.186 | 2.822 | 2.490 | 2.401 | 0.919 | 1.379 | 1.454 | 3.362 | 2.751 | 2.418 | 3.212 |
| 0d7009b | 1.855 | 0.460 | 2.695 | 1.186 | 2.822 | 2.490 | 2.401 | 0.919 | 1.379 | 1.454 | 3.362 | 2.751 | 2.418 | 3.212 |
This PR might have some performance drawbacks. Since frontend timing currently satisfies the requirements, this PR is not necessary for now.
SPEC 06 (0.3 coverage) tests show overall performance essentially unchanged (~0.03% increase), with ±1% fluctuations at individual test points (GemsFDTD -1.11%; zeusmp +0.74%). I guess this is acceptable.