coreblocks Add pipelining support to LSU requester

Here is a small refactor of the LSURequester it now support the request pipelining thanks to using the fifo. Additionally unit tests has to be updated, because after that change DummyLSU started to support reordering of miss-aligned instructions before the correct once.

Based on #696

May 05 '24 11:05 lekcyjna123

Benchmarks summary

Performance benchmarks

aha-mont64	crc32	minver	nettle-sha256	nsichneu	slre	statemate	ud
0.407 (0.000)	0.527 (0.000)	0.321 (0.000)	0.652 (0.000)	0.345 (0.000)	0.283 (0.000)	0.317 (0.000)	0.405 (0.000)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5)	LUTs used as DFF: (ECP5)	LUTs used as carry: (ECP5)	LUTs used as ram: (ECP5)	Max clock frequency (Fmax)
▼ 21866 (-601)	▲ 5569 (+8)	▼ 770 (-32)	▲ 1012 (+8)	▼ 48 (-1)

Synthesis benchmarks (full)

Device utilisation: (ECP5)	LUTs used as DFF: (ECP5)	LUTs used as carry: (ECP5)	LUTs used as ram: (ECP5)	Max clock frequency (Fmax)
▼ 33465 (-213)	▲ 8811 (+8)	1932 (0)	▲ 1192 (+8)	▲ 42 (+2)

May 05 '24 11:05 github-actions[bot]

Benchmarks summary

Performance benchmarks

aha-mont64	crc32	minver	nettle-sha256	nsichneu	slre	statemate	ud
0.407 (0.000)	0.527 (0.000)	0.321 (0.000)	0.652 (0.000)	0.345 (0.000)	0.283 (0.000)	0.317 (0.000)	0.405 (0.000)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5)	LUTs used as DFF: (ECP5)	LUTs used as carry: (ECP5)	LUTs used as ram: (ECP5)	Max clock frequency (Fmax)
▲ 23085 (+163)	▲ 5569 (+8)	▲ 802 (+32)	▲ 1012 (+8)	▼ 46 (-4)

Synthesis benchmarks (full)

Device utilisation: (ECP5)	LUTs used as DFF: (ECP5)	LUTs used as carry: (ECP5)	LUTs used as ram: (ECP5)	Max clock frequency (Fmax)
▼ 32440 (-1665)	▲ 8811 (+8)	▼ 1932 (-32)	▲ 1192 (+8)	▼ 41 (-1)

May 05 '24 13:05 github-actions[bot]

Benchmarks summary

Performance benchmarks

aha-mont64	crc32	minver	nettle-sha256	nsichneu	slre	statemate	ud
0.407 (0.000)	0.527 (0.000)	0.321 (0.000)	0.652 (0.000)	0.345 (0.000)	0.283 (0.000)	0.317 (0.000)	0.405 (0.000)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5)	LUTs used as DFF: (ECP5)	LUTs used as carry: (ECP5)	LUTs used as ram: (ECP5)	Max clock frequency (Fmax)
▲ 24084 (+1162)	▲ 5569 (+8)	▲ 802 (+32)	▲ 1012 (+8)	▼ 49 (-1)

Synthesis benchmarks (full)

Device utilisation: (ECP5)	LUTs used as DFF: (ECP5)	LUTs used as carry: (ECP5)	LUTs used as ram: (ECP5)	Max clock frequency (Fmax)
▼ 30477 (-3628)	▲ 8811 (+8)	1964 (0)	▲ 1192 (+8)	▼ 41 (-1)

May 05 '24 13:05 github-actions[bot]

No change in benchmarks, as Wishbone Classic doesn't support pipelining.

May 05 '24 13:05 tilk

No change in benchmarks, as Wishbone Classic doesn't support pipelining.

Yes, I have expected that, but I started the benchmark to make sure that there is no regression.

May 05 '24 13:05 lekcyjna123

coreblocks coreblocks copied to clipboard

Add pipelining support to LSU requester

Benchmarks summary

Performance benchmarks

Synthesis benchmarks (basic)

Synthesis benchmarks (full)

Benchmarks summary

Performance benchmarks

Synthesis benchmarks (basic)

Synthesis benchmarks (full)

Benchmarks summary

Performance benchmarks

Synthesis benchmarks (basic)

Synthesis benchmarks (full)

coreblocks
coreblocks copied to clipboard