sincos icon indicating copy to clipboard operation
sincos copied to clipboard

Have you considered using CORDIC

Open drom opened this issue 8 years ago • 13 comments

Hi @jamesbowman nice little library. Good algorithm when you are running on CPU / FPGA with fast HW multiplier. As for Verilog implementation, have you considered using CORDIC algorithm. To my experience it requires way less gates and more pipeline friendly.

drom avatar Jan 07 '17 20:01 drom

hmm: https://github.com/cebarnes/cordic

jamesbowman avatar Jan 07 '17 20:01 jamesbowman

@jamesbowman yep, something like this.

drom avatar Jan 07 '17 20:01 drom

@jamesbowman what is the target application for this library?

drom avatar Jan 07 '17 21:01 drom

Good summary:

http://www.andraka.com/files/crdcsrvy.pdf

jamesbowman avatar Mar 22 '17 15:03 jamesbowman

@jamesbowman I cant access the link. for some reason. Is it correct?

drom avatar Mar 22 '17 15:03 drom

Yes, link is correct. Attached:

crdcsrvy.pdf

jamesbowman avatar Mar 22 '17 15:03 jamesbowman

(Looking at this because my verilog sincos is suddenly a timing critical path.)

A straight 1-cycle implementation of 16-bit CORDIC must do 15 additions, inspecting the MSB of each addition. I don't think this will meet timing! Apart from inserting pipelining, is there any way to shorten this critical path?

image

jamesbowman avatar Mar 22 '17 15:03 jamesbowman

@jamesbowman this a naïve way of computing CORDIC, and it is impractical.

Yes, you would need some pipelining to get 16-bit resolution.

To get a good and fast CORDIC implementation you would probably have to answer few questions.

Do you want 'generic' CORDIC? Meaning: All inputs (x0, y0, z0) are non-zero, and All outputs (xn, yn, zn) are in use, and need to be accurate.

With 'generic' CORDIC you can do (polar -> rectangular) or (rectangular -> polar) or mix.

Real savings can be achieved when you need only (rectangular -> polar) only. When you compute atan(x/y) for example. Is it your case?

drom avatar Mar 22 '17 16:03 drom

The module is only computing sin(x), in Verilog:

module isin(
  input signed [15:0] x,
  output signed [15:0] s);

so that is (polar -> rectangular) with r=1. So x0 is a constant, y0 is zero, z0 is the angle Only the sine output (yn) is required.

Only 12 bits of result (that is, sign + 11 bits) are really needed.

jamesbowman avatar Mar 22 '17 16:03 jamesbowman

in pure (rectangular -> polar) mode you can start with one of MSB lookup tables:

2-bit. gives you a quadrant. 3-bit. need to store 2-value (0.707, 1) 4-bit. 4-value table. 5-bit. 8-value table. etc.

You need to pre-scale values in the table depending of how many stages you will do after.

drom avatar Mar 22 '17 16:03 drom

For the second stage you can lookup next group of bits (2-5) to decide on batch sequence of rotations that you want to perform. Batch rotations don't require intermediate sign values, so they can be done in Sign-Digit (SD) form https://en.wikipedia.org/wiki/Signed-digit_representation

drom avatar Mar 22 '17 17:03 drom

@jamesbowman just to remind that single CORDIC step yields less the 1 bit of precision.

drom avatar Mar 22 '17 17:03 drom

Good paper to read: rodrigues2010.pdf

drom avatar Mar 22 '17 17:03 drom