clash-compiler
clash-compiler copied to clipboard
Question about the performance of the Fixed type
I'm implementing an FM radio receiver in clash here. The fixed point data types have been very useful for keeping track of the decimal point and make DSP super nice in Clash.
However, the performance of the generated code is not so nice. As stated in the docs, the Num operations saturate on overflow. This adds extra logic and prevents Vivado from using the very efficient DSP blocks. My filters implemented using the Signed type take 2 levels of logic, but using the Fixed type they blow out to 10+.
Why do the fixed types saturate on overflow? IMO, they should just wrap and be a zero-cost type-level way of tracking where the decimal point is. When doing DSP stuff you usually just size things so overflow wont happen, or you use accumulators that are big enough and only perform the saturation at the end.
Would it be possible to change the fixed types to not wrap so that the generated logic is more efficient? Alternatively, I can implement my own type, but I think clash users would benefit from a more efficient Fixed type.
In general we try to offer zero-cost abstractions first, so I'm not sure why they're saturating by default. However, you could try and wrap using Clash.Num.Wrapping. Does that lead to DSP blocks getting inferred?
Oh that's cool. That will almost certainly solve my problems.
Alright :). Let us know if it inferred alright; if it doesn't we can rename + leave this issue open.
Wrapping
is probably a more pleasant interface, but it can also be decided on a case-by-case basis for Fixed
by using Clash.Class.Num.SaturatingNum. Perhaps unexpectedly, you can use SaturatingNum
to turn saturation off.
I believe it was chosen to be saturating by default, although it was always somewhat controversial, as it was deemed to be more commonly used. Integers are used in all sorts of applications including indexing stuff, but Fixed
has a more narrow scope, and it was thought it was better served with saturation by default.
Yep, I thought about using SaturatingNum, but the problem with doing it on a case by case basis is that I would really like to leave the implementation of the arithmetic ops up to the type. It is conceivable that I would want my filters to do saturating addition with a performance penalty in some situation. So yeah, I agree that Wrapping is the more pleasant interface.
Well, I disagree that saturation is the most commonly used case for Fixed, but that's a matter of personal opinion :). It removes the possibility of using DSP blocks in FPGAs that have them because DSPs contain an adder followed by a register within the block. There is no option for inserting in extra logic to do saturation between the adder and the register.
I did try to use Wrapping but unfortunately I was not successful. I have hardcoded mac units that I know map to DSP blocks. Problem is they work on Signeds (I don't know how to write them generically to work on both Signed and SFixeds) so I just use the coerce function to make them work on SFixeds since the underlying representation is the same. I cant use coerce with Wrapping though since the constructor is not exported.
I'm gonna think about how to do this more elegantly.
You should be able to use bitCoerce
to convert to and from Wrapping SFixed and Signed, without needing the constructor
You could also use fromWrapping
, it should be free in Haskell/HDL.
Yep, I can use bitCoerce and toWrapping/fromWrapping. That would work but I feel there is too much wrapping and unwrapping going on. I want to use my mac functions directly without creating wrappers to unwrap the arguments and rewrap the result.
I'm trying to make a library of guaranteed efficient digital filters where you can instantiate one in a couple of lines and the types make it super easy and keep track of everything for you. So, I'm being a bit pedantic about avoiding things like wrapping/unwrapping.
Happy to close this issue if you guys are ok with the current behaviour where Fixed saturates. I can make my own type as part of my filter library if needed, or make my MAC functions work with Wrapping SFixeds
Hmm, well ideally Vivado would infer the DSPs when using Wrapping
. How you got any idea why it doesn't?
I'm sure it will, but I haven't tried yet.
The problem is that I have to change this line and a few others to manually wrap and unwrap the arguments and result of the macPreAddRealRealPipelined
function, since coerce wont work with Wrapping SFixed
No big deal, but I was hoping to avoid boilerplate like that.
Yeah, so we should just export the constructor. I don't see a reason not to. And if there is, we should export it from an Internal
module anyway.
Yep, that would solve my issue.
Feel free to open a PR if we're not fast enough!
We've released v1.6.5, which includes a fix for this issue.