coreblocks
coreblocks copied to clipboard
Add pipelining support to LSU requester
Here is a small refactor of the LSURequester it now support the request pipelining thanks to using the fifo. Additionally unit tests has to be updated, because after that change DummyLSU started to support reordering of miss-aligned instructions before the correct once.
Based on #696
Benchmarks summary
Performance benchmarks
aha-mont64 | crc32 | minver | nettle-sha256 | nsichneu | slre | statemate | ud |
---|---|---|---|---|---|---|---|
0.407 (0.000) | 0.527 (0.000) | 0.321 (0.000) | 0.652 (0.000) | 0.345 (0.000) | 0.283 (0.000) | 0.317 (0.000) | 0.405 (0.000) |
You can view all the metrics here.
Synthesis benchmarks (basic)
Device utilisation: (ECP5) | LUTs used as DFF: (ECP5) | LUTs used as carry: (ECP5) | LUTs used as ram: (ECP5) | Max clock frequency (Fmax) |
---|---|---|---|---|
▼ 21866 (-601) | ▲ 5569 (+8) | ▼ 770 (-32) | ▲ 1012 (+8) | ▼ 48 (-1) |
Synthesis benchmarks (full)
Device utilisation: (ECP5) | LUTs used as DFF: (ECP5) | LUTs used as carry: (ECP5) | LUTs used as ram: (ECP5) | Max clock frequency (Fmax) |
---|---|---|---|---|
▼ 33465 (-213) | ▲ 8811 (+8) | 1932 (0) | ▲ 1192 (+8) | ▲ 42 (+2) |
Benchmarks summary
Performance benchmarks
aha-mont64 | crc32 | minver | nettle-sha256 | nsichneu | slre | statemate | ud |
---|---|---|---|---|---|---|---|
0.407 (0.000) | 0.527 (0.000) | 0.321 (0.000) | 0.652 (0.000) | 0.345 (0.000) | 0.283 (0.000) | 0.317 (0.000) | 0.405 (0.000) |
You can view all the metrics here.
Synthesis benchmarks (basic)
Device utilisation: (ECP5) | LUTs used as DFF: (ECP5) | LUTs used as carry: (ECP5) | LUTs used as ram: (ECP5) | Max clock frequency (Fmax) |
---|---|---|---|---|
▲ 23085 (+163) | ▲ 5569 (+8) | ▲ 802 (+32) | ▲ 1012 (+8) | ▼ 46 (-4) |
Synthesis benchmarks (full)
Device utilisation: (ECP5) | LUTs used as DFF: (ECP5) | LUTs used as carry: (ECP5) | LUTs used as ram: (ECP5) | Max clock frequency (Fmax) |
---|---|---|---|---|
▼ 32440 (-1665) | ▲ 8811 (+8) | ▼ 1932 (-32) | ▲ 1192 (+8) | ▼ 41 (-1) |
Benchmarks summary
Performance benchmarks
aha-mont64 | crc32 | minver | nettle-sha256 | nsichneu | slre | statemate | ud |
---|---|---|---|---|---|---|---|
0.407 (0.000) | 0.527 (0.000) | 0.321 (0.000) | 0.652 (0.000) | 0.345 (0.000) | 0.283 (0.000) | 0.317 (0.000) | 0.405 (0.000) |
You can view all the metrics here.
Synthesis benchmarks (basic)
Device utilisation: (ECP5) | LUTs used as DFF: (ECP5) | LUTs used as carry: (ECP5) | LUTs used as ram: (ECP5) | Max clock frequency (Fmax) |
---|---|---|---|---|
▲ 24084 (+1162) | ▲ 5569 (+8) | ▲ 802 (+32) | ▲ 1012 (+8) | ▼ 49 (-1) |
Synthesis benchmarks (full)
Device utilisation: (ECP5) | LUTs used as DFF: (ECP5) | LUTs used as carry: (ECP5) | LUTs used as ram: (ECP5) | Max clock frequency (Fmax) |
---|---|---|---|---|
▼ 30477 (-3628) | ▲ 8811 (+8) | 1964 (0) | ▲ 1192 (+8) | ▼ 41 (-1) |
No change in benchmarks, as Wishbone Classic doesn't support pipelining.
No change in benchmarks, as Wishbone Classic doesn't support pipelining.
Yes, I have expected that, but I started the benchmark to make sure that there is no regression.