Lookup Tables for Trigonometric Functions
Dear all,
Regarding constant function optimization, in the Halide-HLS paper (Programming Heterogeneous Systems from an Image Processing DSL), it has been declared that:
Enabling more design optimizations, the compiler can also statically evaluate constant functions (e.g. lookup tables), and generate the code that later synthesizes to ROMs.
There are some examples in hls_examples directory of Halide-HLS (e.g. unsharp_hls) that use exponential function in a reduction domain, and the values for that exponential function is replaced with constants in generated HLS code.
I am trying to infer look up tables for sin(x) and cos(y) in the following code:
Input(x,y) = Input_Image(x, y);
fy_f(y) = cos(y);
fx_f(x) = sin(x);
fy(y) = Halide::cast<uint16_t>(fy_f(y) * 65535);
fx(x) = Halide::cast<uint16_t>(fx_f(x) * 65535);
fxy(x, y) = fx(x) * fy(y);
hw_output(x,y) = Input(x,y) * fxy(x, y);
output(x,y) = hw_output(x,y);
It seems that there are sin_f32() and cos_f32() functions in the generated HLS code, which receive their arguments from loops indexes, and Vivado HLS does not use lookup tables for those functions, eventhough the loop indexes are known.
I know we can use constant arrays which have been evaluated on corresponding indexes of sin() and cos() in Halide code instead of using those functions explicitly. But I wonder can Halide-HLS compiler generate lookup tables directly for those functions, not just in reduction domain manner as it does in unsharp_hls example. Is there a Halide primitive that can be used in this situation?
Thanks!
One way to generate a lookup table for this is treating fy_f and fx_f as taps. If you don't put them into the input argument lists for accelerate_at, for example, you can just write accelerate_at({Input}, ...), then fy_f and fx_f will be generated as taps. In HLS, you will see stencils corresponding to these two taps. You can take a look at conv_hls example for how taps are generated. Note, by default stencils have been partitioned completely, so they are implemented as registers, but if you remove the array_partition pragma for those stencils in HLS code, vivado_hls will generate lookup table for these stencils.
Let me know if that is not clear.
@xuanyoya Thank you for your response. In conv_hls example, as you mentioned, exponential functions are generated as taped stencil, which are input to the hardware accelerator (hls_target). These taped stencil are computed and located in test bench file (pipeline_hls.cpp) and are not part of the hardware accelerator, so theses lookup tables are not actually synthesized, just an interface is created for their corresponding lookup tables, am I correct? If so, if we want to have these lookup tables in hardware, should we create another top level wrapper function using hls_target and those lookup tables manually or Halide-HLS can do that automatically in someway?
Thanks!
@amisal88 The ROM optimization mentioned in the paper is implemented by inferring from an unroll directive applied on a LUT function. Loop unrolling together with constant propagation move the match from runtime to code generation time. See the curve function in the camera_pipe app as an example.
https://github.com/jingpu/Halide-HLS/blob/HLS/apps/hls_examples/camera_pipe_hls/pipeline.cpp#L551