coreblocks icon indicating copy to clipboard operation
coreblocks copied to clipboard

Add pipelining support to LSU requester

Open lekcyjna123 opened this issue 2 months ago • 5 comments

Here is a small refactor of the LSURequester it now support the request pipelining thanks to using the fifo. Additionally unit tests has to be updated, because after that change DummyLSU started to support reordering of miss-aligned instructions before the correct once.

Based on #696

lekcyjna123 avatar May 05 '24 11:05 lekcyjna123

Benchmarks summary

Performance benchmarks

aha-mont64 crc32 minver nettle-sha256 nsichneu slre statemate ud
0.407 (0.000) 0.527 (0.000) 0.321 (0.000) 0.652 (0.000) 0.345 (0.000) 0.283 (0.000) 0.317 (0.000) 0.405 (0.000)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▼ 21866 (-601) ▲ 5569 (+8) ▼ 770 (-32) ▲ 1012 (+8) ▼ 48 (-1)

Synthesis benchmarks (full)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▼ 33465 (-213) ▲ 8811 (+8) 1932 (0) ▲ 1192 (+8) ▲ 42 (+2)

github-actions[bot] avatar May 05 '24 11:05 github-actions[bot]

Benchmarks summary

Performance benchmarks

aha-mont64 crc32 minver nettle-sha256 nsichneu slre statemate ud
0.407 (0.000) 0.527 (0.000) 0.321 (0.000) 0.652 (0.000) 0.345 (0.000) 0.283 (0.000) 0.317 (0.000) 0.405 (0.000)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▲ 23085 (+163) ▲ 5569 (+8) ▲ 802 (+32) ▲ 1012 (+8) ▼ 46 (-4)

Synthesis benchmarks (full)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▼ 32440 (-1665) ▲ 8811 (+8) ▼ 1932 (-32) ▲ 1192 (+8) ▼ 41 (-1)

github-actions[bot] avatar May 05 '24 13:05 github-actions[bot]

Benchmarks summary

Performance benchmarks

aha-mont64 crc32 minver nettle-sha256 nsichneu slre statemate ud
0.407 (0.000) 0.527 (0.000) 0.321 (0.000) 0.652 (0.000) 0.345 (0.000) 0.283 (0.000) 0.317 (0.000) 0.405 (0.000)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▲ 24084 (+1162) ▲ 5569 (+8) ▲ 802 (+32) ▲ 1012 (+8) ▼ 49 (-1)

Synthesis benchmarks (full)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▼ 30477 (-3628) ▲ 8811 (+8) 1964 (0) ▲ 1192 (+8) ▼ 41 (-1)

github-actions[bot] avatar May 05 '24 13:05 github-actions[bot]

No change in benchmarks, as Wishbone Classic doesn't support pipelining.

tilk avatar May 05 '24 13:05 tilk

No change in benchmarks, as Wishbone Classic doesn't support pipelining.

Yes, I have expected that, but I started the benchmark to make sure that there is no regression.

lekcyjna123 avatar May 05 '24 13:05 lekcyjna123