SGen
SGen copied to clipboard
SGen is a generator capable of producing efficient hardware designs operating on streaming datasets. “Streaming” means that the dataset is divided into several chunks that are processed during several...
SGen is a generator capable of producing efficient hardware designs for a variety of signal processing transforms. These designs operate on streaming data, meaning that the dataset is divided into several chunks that are processed during several cycles, thus allowing a reduced use of resources. The size of these chunks is called the streaming width. As an example, the figures below represent three discrete Fourier transforms on 8 elements, with a streaming width of 8 (no streaming), 4 and 2.
The generator outputs a Verilog file that can be used for FPGAs.
- A technical overview and an interface to download various generated designs is available here.
- Feel free to report any bug, issue, or suggestion to François Serre.
Quick Start
The easiest way to use SGen is by using SBT. The following commands, in a Windows or Linux console, will generate a streaming Walsh-Hadamard transform on 8 points:
git clone https://github.com/fserre/sgen.git
cd sgen
sbt "run -n 3 wht"
The following section describes the different parameters that can be used.
Command-line interface
A SGen command line consists of a list parameters followed by the desired transform.
Parameters
The following parameters can be used:
-nnLogarithm of the size of the transform. As an example,-n 3means that the transform operates on 2^3=8 points. This parameter is required.-kkLogarithm of the streaming width of the implementation.-k 2means that the resulting implementation will have 2^2=4 input ports and 4 output ports, and will perform a transform every 2^(n-k) cycles. In case where this parameter is not specified, the implementation is not folded, i.e. the implementation will have one input port and one output port for each data point, and will perform one transform per cycle.-rrLogarithm of the radix (for DFTs and WHTs). This parameter specifies the size of the base transform used in the algorithm. r must divide n, and, for compact designs (dftcompactandwhtcompact), be smaller or equal to k. It is ignored by tranforms not requiring it (permutations). If this parameter is not specified, the highest possible radix is used.-ofilenameName of the output file.-benchmarkAdds a benchmark module in the generated design.-rtlgraphProduces a DOT graph representing the design.-dualramcontrolUses dual-control for memory (read and write addresses are computed independently). This option yields designs that use more resources than with single RAM control (default), but that have more flexible timing constraints (see the description in generated files). This option is automatically enabled for compact designs.-singleportedUses single-ported memory (read and write addresses are the same). This option has the same constraints as single RAM control (default), but may have a higher latency.-zipCreates a zip file containing the design and its dependencies (e.g. FloPoCo modules).-hwreprHardware arithmetic representation of the input data.reprcan be one of the following:charSigned integer of 8 bits. Equivalent ofsigned 8.shortSigned integer of 16 bits. Equivalent ofsigned 16.intSigned integer of 32 bits. Equivalent ofsigned 32.longSigned integer of 64 bits. Equivalent ofsigned 64.ucharUnsigned integer of 8 bits. Equivalent ofunsigned 8.ushortUnsigned integer of 16 bits. Equivalent ofunsigned 16.uintUnsigned integer of 32 bits. Equivalent ofunsigned 32.ulongUnsigned integer of 64 bits. Equivalent ofunsigned 64.floatSimple precision floating point (32 bits). Equivalent ofieee754 8 23.doubleDouble precision floating point (64 bits). Equivalent ofieee754 11 52.halfHalf precision floating point (16 bits). Equivalent ofieee754 5 10.minifloatMinifloat of 8 bits. Equivalent ofieee754 4 3.bfloat16bfloat16 floating point . Equivalent ofieee754 8 7.unsignedsizeUnsigned integer of size bits.signedsizeSigned integer of size bits. Equivalent offixedpointsize0.fixedpointinteger fractionalSigned fixed-point representation with an integer part of integer bits and a fractional part of fractional bits.flopocowE wFFloPoCo floating point representation with an exponent size of wE bits, and a mantissa of wF bits. The resulting design will depend on the corresponding FloPoCo generated arithmetic operators, which must be placed in theflopocofolder. In the case where the corresponding vhdl file is not present, SGen provides the command line to generate it. Custom options (e.g.frequencyortarget) can be used.ieee754wE wFIEEE754 floating point representation with an exponent size of wE bits, and a mantissa of wF bits. Arithmetic operations are performed using FloPoCo. Note that, unless otherwise specified when generating FloPoCo operators, denormal numbers are flushed to zero.complexreprCartesian complex number. Represented by the concatenation of its coordinates, each represented using repr arithmetic representation.
Transforms
Supported transforms are the following:
Streaming linear permutations
Linear permutations can be implemented using the lp command:
# generates a bit-reversal permutation on 32 points, streamed on 2^2=4 ports.
sbt "run -n 5 -k 2 lp bitrev"
# generates a streaming datapath that performs a bit-reversal permutation on 8 points on the first dataset, and a "half-reversal" on the second dataset on 2 ports
sbt "run -n 3 -k 1 lp bitrev 100110111"
The command lp takes as parameter the invertible bit-matrix representing the linear permutation (see this publication for details) in row-major order. Alternatively, the matrix can be replaced by the keyword bitrev or identity.
Several bit-matrices can be listed (seperated by a space) to generate a datapath performing several permutations. In this case, the first permutation will be applied to the first dataset entering, the second one on the second dataset, ...
The resulting implementation supports full throughput, meaning that no delay is required between two datasets.
Fourier Transforms (full throughput)
Fourier transforms (with full-throughput, i.e. without delay between datasets) can be generated using the dft command:
# generates a Fourier transform on 16 points, streamed on 4 ports, with fixed-point complex datatype with a mantissa of 8 bits and an exponent of 8 bits.
sbt "run -n 4 -k 2 -hw complex fixedpoint 8 8 dft"
Fourier Transforms (compact design)
Fourier transforms (with an architecture that reuses several times the same hardware) can be generated using the dftcompact command:
# generates a Fourier transform on 1024 points, streamed on 8 ports, with fixed-point complex datatype with a mantissa of 8 bits and an exponent of 8 bits.
sbt "run -n 10 -k 3 -hw complex fixedpoint 8 8 dftcompact"
RAM control
In the case of a streaming design (n > k), memory modules may need to be used. In this case, SGen allows to choose the control strategy used for this module:
- Dual RAM control: read and write addresses are computed independently. This offers the highest flexibility (a dataset can be input at any time after the previous one), but this uses more resources. It is automatically used for compact designs (
dftcompact), but can be enabled for other designs using the-dualRAMcontrolparameter. - Single RAM control: write address is the same as the read address, delayed by a constant time. This uses less resources, but it has less flexibility: a dataset must be input either immediately after the previous one, or wait that the previous dataset is completely out. This is the default mode (except for compact designs).
- Single-ported RAM: write and read addresses are the same. This has the same constraints as Single RAM control, but may have a higher latency.