blarney-lang/blarney: Haskell library for hardware description

trafficstars

:toc: macro :toclevels: 4 :toc-title: :toc-placement!: :source-highlighter:

image::blarney-logo.svg#gh-light-mode-only[Blarney logo, width=275] image::blarney-logo-dark.svg#gh-dark-mode-only[Blarney logo, width=275]

Blarney is a Haskell library for hardware description that builds a range of HDL abstractions on top of a small set of pure functional circuit primitives. It is a modern variant of http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.110.5587&rep=rep1&type=pdf[Lava] with Verilog and SMT backends. Some aspects of the library are also inspired by https://github.com/B-Lang-org/bsc[Bluespec], such as first-class actions and method-based interfaces. Applications of Blarney include https://github.com/CTSRD-CHERI/SIMTight[SIMTight] (a CHERI-enabled RISC-V GPGPU) and https://github.com/blarney-lang/actora[Actora] (a stack machine with a compiler for an Erlang-like language). Below, we introduce the library by example, supplementing the http://blarney-lang.github.io/blarney/index.html[Haddock docs].

[discrete] == Contents

toc::[]

:sectnums:

== Prerequisites

We'll need Verilator and GHC 9.2.1 or later.

On Ubuntu 20.04, we can do:

[source, shell]

$ sudo apt install verilator

For GHC 9.2.1 or later, https://www.haskell.org/ghcup/[ghcup] can be used.

== Quick start

Recursively clone the repo and set the BLARNEY_ROOT environment variable to point to it, and add the Scripts directory to your PATH:

[source, shell]

$ git clone --recursive https://github.com/blarney-lang/blarney $ export BLARNEY_ROOT=$(pwd)/blarney $ export PATH=$PATH:$BLARNEY_ROOT/Scripts

To run an example using Blarney's in-Haskell simulator:

[source, shell]

$ cd Examples/Sorter $ make $ ./Sorter --simulate

You should see the output:

sort [3,4,1,0,2] = [0,1,2,3,4]

To generate Verilog for an example, and then simulate the Verilog using Verilator, omit the --simulate option and then run the generated makefile:

[source, shell]

$ ./Sorter # Run the Verilog generator $ cd Sorter-Verilog $ make # Compile the generated Verilog using Verilator $ ./Sorter # Simulate the generated Verilog

You should see the same output as before. Using Verilator is the recommended approach for simulating Blarney designs.

To run the regression test suite:

[source, shell]

$ cd Test $ ./test.sh --run-all

== Blarney by example

=== Two-sort

Sorting makes for a good introduction to the library. Let's start with the simplest kind of sorter possible: given a pair of 8-bit values, the function twoSort returns the sorted pair.

[source, haskell]

import Blarney

twoSort :: (Bit 8, Bit 8) -> (Bit 8, Bit 8) twoSort (a, b) = a .<. b ? ((a, b), (b, a))

This definition makes use of three Blarney constructs: the Bit type for bit vectors (parametised by the size of the vector); the comparison operator .<.; and the ternary conditional operator ?. A quick test bench to check that it works:

[source, haskell]

top :: Module () top = always do display "twoSort (1,2) = " (twoSort (1,2)) display "twoSort (2,1) = " (twoSort (2,1)) finish

We use Blarney's always construct to perform the given action on every clock cycle. Blarney actions include statements for displaying values during simulation (display), terminating the simulator (finish), and mutating state (see below). All statements in an Action execute in parallel, within a single cycle of an implicit clock. We can generate Verilog for the test bench as follows.

[source, haskell]

main :: IO () main = writeVerilogTop top "top" "/tmp/twoSort/"

Assuming the above code is in a file named Sorter.hs, it can be compiled at the command-line using

[source, shell]

$ blc Sorter.hs

where blc stands for Blarney compiler. This is just a script (from Blarney's Scripts directory) that invokes GHC with the appropriate compiler flags. Running the resulting executable ./Sorter will produce Verilog in the /tmp/twoSort directory, including a makefile to build a Verilator simulator. The simulator can be built and run as follows.

[source, shell]

$ cd /tmp/twoSort $ make $ ./top twoSort (1,2) = (1,2) twoSort (2,1) = (1,2)

=== In-Haskell simulation

Sometimes it can be convenient to skip Verilog generation, and use the in-Haskell simulator.

[source, haskell]

main :: IO () main = simulate top

Now after running ./Sorter we see the test bench output directly.

[source, shell]

$ ./Sorter twoSort (1,2) = (1,2) twoSort (2,1) = (1,2)

The in-Haskell simulator is much slower than Verilator, but can be more convenient for small designs. It is currently an experimental feature.

=== Bubble sort

We can build a general N-element sorter by connecting together multiple two-sorters. One of the simplest ways to do this is the bubble sort network. The key component is a function bubble that takes a list of inputs and returns a new list in which the smallest element comes first.

[source, haskell]

bubble :: [Bit 8] -> [Bit 8] bubble [] = [] bubble [x] = [x] bubble (x:y:rest) = bubble (small:rest) ++ [large] where (small, large) = twoSort (x, y)

If we repeatedly call bubble then we end up with a sorted list.

[source, haskell]

sort :: [Bit 8] -> [Bit 8] sort [] = [] sort xs = smallest : sort rest where smallest:rest = bubble xs

Running the test bench

[source, haskell]

top :: Module () top = always do let inputs = [3, 4, 1, 0, 2] display "sort " inputs " = " (sort inputs) finish

in simulation yields:

sort [3,4,1,0,2] = [0,1,2,3,4]

To see that the sort function really is describing a circuit, let's draw the circuit digram for a 5-element bubble sorter.

    -->.
       |
    -->+---.
       |   |

Inputs -->+---+---. | | | -->+---+---+---. | | | | -->+---+---+---+---. | | | | | v v v v v

            Outputs

The input list is supplied on the left, and the sorted output list is produced at the bottom. Each + denotes a two-sorter that takes inputs from the top and the left, and produces the smaller value to the bottom and the larger value to the right. See https://pdfs.semanticscholar.org/de30/22efc5aec833d7b52bd4770a382fea729bba.pdf[The design and verification of a sorter core] for a more in-depth exploration of sorting circuits in Haskell.

=== Polymorphism

For simplicity, we've made our sorter specific to lists of 8-bit values. But if we look at the types of the primitive functions it uses, we can see that it actually has a more general type.

[source, haskell]

(.<.) :: Cmp a => a -> a -> Bit 1 (?) :: Bits a => Bit 1 -> (a, a) -> a

So .<. can be used on any type in the http://blarney-lang.github.io/blarney/Blarney-Core-Bit.html#t:Cmp[Cmp] (comparator) class. Similarly, ? can be used on any type in the <<Bits>> class (which allows packing to a bit vector and back again). So a more generic definition of twoSort would be:

[source, haskell]

twoSort :: (Bits a, Cmp a) => (a, a) -> (a, a) twoSort (a, b) = a .<. b ? ((a, b), (b, a))

Indeed, this would be the type inferred by the Haskell compiler if no type signature was supplied. Using Haskell's rebindable syntax, we can also use an if-then-else expression instead of the ternary conditional operator:

[source, haskell]

twoSort :: (Bits a, Cmp a) => (a, a) -> (a, a) twoSort (a, b) = if a .<. b then (a, b) else (b, a)

=== Mutable registers

So far, we've only seen display and finish actions inside a Blarney module. Also supported are creation and assignment of registers. To illustrate, here is a module that creates a 4-bit cycleCount register, increments it on each cycle, stopping when it reaches 10.

[source, haskell]

top :: Module () top = do -- Create a register cycleCount :: Reg (Bit 4) <- makeReg 0

always do -- Increment on every cycle cycleCount <== cycleCount.val + 1

-- Display value on every cycle
display "cycleCount = " cycleCount.val

-- Terminate simulation when count reaches 10
when (cycleCount.val .==. 10) do
  display "Finished"
  finish

This example introduces a number of new library functions: makeReg creates a register, initialised to the given value; the val field yeilds the current value of the register; and when allows conditional actions to be introduced. We can use if-then-else in an Action context. For example, the final three lines above could have been written as:

[source, haskell]

-- Terminate simulation when count reaches 10 if cycleCount.val .==. 10 then do display "Finished" finish else display "Not finished"

Running top in simulation gives

cycleCount = 0 cycleCount = 1 cycleCount = 2 cycleCount = 3 cycleCount = 4 cycleCount = 5 cycleCount = 6 cycleCount = 7 cycleCount = 8 cycleCount = 9 cycleCount = 10 Finished

=== Queues

Queues (also known as FIFOs) are a commonly used abstraction in hardware design. Blarney provides http://blarney-lang.github.io/blarney/Blarney-Queue.html[a range of different queue implementations], all of which implement the following interface available when importing Blarney.Queue.

[source, haskell]

-- Queue interface data Queue a = Queue { notEmpty :: Bit 1 -- Is the queue non-empty? , notFull :: Bit 1 -- Is there any space in the queue? , enq :: a -> Action () -- Insert an element (assuming notFull) , deq :: Action () -- Remove the first element (assuming canDeq) , canDeq :: Bit 1 -- Guard on the deq and first methods , first :: a -- View the first element (assuming canDeq) }

The type Queue a represents a queue holding elements of type a, and provides a range of standard functions on queues. The enq method should only be called when notFull is true and the deq method should only be called when canDeq is true. Similarly, the first element of the queue is only valid when canDeq is true. Below, we present the simplest possible implementation of a one-element queue.

[source, haskell]

import Blarney.Queue

-- Simple one-element queue implementation makeSimpleQueue :: Bits a => Module (Queue a) makeSimpleQueue = do -- Register holding the one element reg :: Reg a <- makeReg dontCare

-- Register defining whether or not queue is full full :: Reg (Bit 1) <- makeReg 0

-- Methods return Queue { notFull = full.val .==. 0 , notEmpty = full.val .==. 1 , enq = \a -> do reg <== a full <== 1 , deq = full <== 0 , canDeq = full.val .==. 1 , first = reg.val }

The following simple test bench illustrates how to use a queue.

[source, haskell]

-- Small test bench for queues top :: Module () top = do -- Instantiate a queue of 8-bit values queue :: Queue (Bit 8) <- makeSimpleQueue

-- Create an 8-bit count register count :: Reg (Bit 8) <- makeReg 0

always do count <== count.val + 1

-- Writer side
when queue.notFull do
  queue.enq count.val
  display "Enqueued " count.val

-- Reader side
when queue.canDeq do
  queue.deq
  display "Dequeued " queue.first

-- Terminate after 100 cycles
when (count.val .==. 100) finish

=== Mutable wires

Wires are a feature of the Action monad that offer a way for separate action blocks to communicate within the same clock cycle. Whereas assignment to a register becomes visible on the clock cycle after the assigment occurs, assignment to a wire is visible on the same cycle as the assignment. If no assignment is made to a wire on a particular cycle, then the wire emits its default value on that cycle. When multiple assignments to the same wire occur on the same cycle, the wire emits the bitwise disjunction of all the assigned values.

To illustrate, let's implement an n-bit counter module that supports increment and decrement operations.

[source, haskell]

-- Interface for a n-bit counter data Counter n = Counter { inc :: Action () , dec :: Action () , output :: Bit n }

We'd like the counter to support parallel calls to inc and dec. That is, if inc and dec are called on the same cycle then the counter's output is unchanged. We'll achieve this using wires.

[source, haskell]

makeCounter :: KnownNat n => Module (Counter n) makeCounter = do -- State count :: Reg (Bit n) <- makeReg 0

-- Wires incWire :: Wire (Bit 1) <- makeWire 0 decWire :: Wire (Bit 1) <- makeWire 0

always do -- Increment when (incWire.val .&&. inv decWire.val) do count <== count.val + 1

-- Decrement
when (inv incWire.val .&&. decWire.val) do
  count <== count.val - 1

-- Interface return Counter { inc = do incWire <== 1 dec = do decWire <== 1 output = count.val }

=== Recipes

State machines are a common way of defining the control-path of a circuit. They are typically expressed by doing case-analysis of the current state and manually setting the next state. Quite often however, they can be expressed more neatly in a http://blarney-lang.github.io/blarney/Blarney-Recipe.html[Recipe] -- a simple imperative language with various control-flow constructs.

[source, haskell]

data Recipe = Skip -- Do nothing (in zero cycles) | Tick -- Do nothing (in one cycle) | Action (Action ()) -- Perform action (in one cycle) | Seq [Recipe] -- Execute recipes in sequence | Par [Recipe] -- Fork-join parallelism | Wait (Bit 1) -- Block until condition holds | When (Bit 1) Recipe -- Conditional recipe | If (Bit 1) Recipe Recipe -- If-then-else recipe | While (Bit 1) Recipe -- Loop | Background Recipe -- Run recipe in background

To illustrate, here is a small state machine that computes the factorial of 10.

[source, haskell]

fact :: Module () fact = do -- State n :: Reg (Bit 32) <- makeReg 0 acc :: Reg (Bit 32) <- makeReg 1

-- Compute factorial of 10 let recipe = Seq [ Action do n <== 10 , While (n.val .>. 0) ( Action do n <== n.val - 1 acc <== acc.val * n.val ) , Action do display "fact(10) = " acc.val finish ]

runRecipe recipe

Blarney provides a lightweight compiler for the Recipe language (under 100 lines of code), which we invoke above through the call to runRecipe.

A very common use of recipes is to define test sequences. For example, here is a simple test sequence for the Counter module defined earlier.

[source, haskell]

-- Test-bench for a counter top :: Module () top = do -- Instantiate an 4-bit counter counter :: Counter 4 <- makeCounter

-- Sample test sequence let test = Seq [ Action do counter.inc , Action do counter.inc , Action do counter.inc counter.dec , Action do display "counter = " counter.output finish ]

runRecipe test

Here, we increment counter on the first cycle, and then again on the second. On the third cycle, we both increment and decrement it in parallel. On the fourth cycle, we display the value and terminate the simulator.

=== Statements

For convenience, recipes can also be constucted using do notation. The http://blarney-lang.github.io/blarney/Blarney-Stmt.html[Stmt] monad is simply a wrapper around Recipe, which defines monadic bind as sequential composition. It is entirely syntatic sugar, providing no new functionality.

To illustrate, here's the factorial example from earlier, rewritten using the Stmt monad.

[source, haskell]

fact :: Module () fact = do -- State n :: Reg (Bit 32) <- makeReg 0 acc :: Reg (Bit 32) <- makeReg 1

-- Compute factorial of 10 let stmt = do action do n <== 10 while (n.val .>. 0) do action do n <== n.val - 1 acc <== acc.val * n.val action do display "fact(10) = " acc.val finish

runStmt stmt

We have found that some users prefer Recipe syntax, while others prefer Stmt syntax, so we offer both.

=== Block RAMs

Blarney provides http://blarney-lang.github.io/blarney/Blarney-Core-RAM.html[a variety of block RAM modules] commonly supported on FPGAs. They are all based around the following interface.

[source, haskell]

-- Block RAM interface -- (Parameterised by the address width a and the data width d) data RAM a d = RAM { load :: a -> Action () , store :: a -> d -> Action () , out :: d }

When a load is issued for a given address, the value at that address appears on out on the next clock cycle. When a store is issued, the value is written to the RAM on the current cycle, and a load of the new value can be requested on the subsequent cycle. A parallel load and store should only be issued on the same cycle if the RAM has been created as a dual-port RAM (as opposed to a single-port RAM). To illustrate, here is a test bench that creates a single-port block RAM and performs a store followed by a load.

[source, haskell]

top :: Module () top = do -- Instantiate a 256 element RAM of 5-bit values ram :: RAM (Bit 8) (Bit 5) <- makeRAM

-- Write 10 to ram[0] and read it back again runStmt do action do store ram 0 10 action do load ram 0 action do display "Got " ram.out finish

Somewhat-related to block RAMs are http://blarney-lang.github.io/blarney/Blarney-Core-Module.html#t:RegFile[register files]. The difference is that a register file allows the value at an address to be determined within a clock cycle. It also allows any number of reads and writes to be performed within the same cycle. Register files have the following interface.

[source, haskell]

data RegFile a d = RegFile { index :: a -> d -- Read , update :: a -> d -> Action() -- Write }

To read from a register file, use the index method or the generic lookup operator !. Unlike block RAMs, register files (especially large ones) do not always map efficiently onto hardware, so use with care!

=== Sources, sinks, and streams

[#sources-sinks-streams]

Sources and sinks are commonly-used flow-control abstractions in hardware description. They are often used to implement hardware modules that produce or consume data at a variable rate, depending on internal details of the module that the implementer does not wish to (or is unable to) expose. In Blarney, http://blarney-lang.github.io/blarney/Blarney-SourceSink.html[sources and sinks] are captured by the following interfaces.

[source, haskell]

-- Data is consumed from a source data Source t = Source { -- The next value being produced by the source peek :: t -- Invoke this action to consume the next value , consume :: Action () -- Can the source currently be peeked or consumed? , canPeek :: Bit 1 }

-- Data is injected into a sink data Sink t = Sink { -- Can a value be injected into the sink? canPut :: Bit 1 -- Inject the given value into the sink , put :: t -> Action () }

-- A stream is another name for a source (discussed below) type Stream t = Source t

A queue is both a source and a sink.

[source, haskell]

-- Convert a queue to a source instance ToSource (Queue t) t where toSource :: Queue t -> Source t toSource q = Source { canPeek = q.canDeq , peek = q.first , consume = q.deq }

-- Convert a queue to a sink instance ToSink (Queue t) t where toSink :: Queue t -> Sink t toSink q = Sink { canPut = q.notFull , put = q.enq }

-- Another name for toSource (discussed below) toStream :: ToSource a b => a -> Stream b toStream = toSource

Sources and sinks can be https://blarney-lang.github.io/blarney/Blarney-Connectable.html[connected together].

Note that taking a sink as a function argument (input) is very similar to returning a source as a function result (output). Both allow the function to produce data at a variable rate. Is it therefore redundant to provide both Source and Sink? Not quite. When a function takes a sink as input, it knows when the caller is ready to consume before producing data; when a function returns a source as output, it knows when the caller does consume after producing data. This subtle difference can be important when the programmer wants minimise buffering and latency between producer and consumer. Often though, we don't mind buffering (it's good for Fmax) so our convention is to use the Stream type in most circumstances.

As an example, here's a function that increments each value in an input stream to produce an output stream.

[source, haskell]

inc :: Stream (Bit 8) -> Module (Stream (Bit 8)) inc xs = do -- Output buffer buffer <- makeQueue

always do -- Incrementer when (xs.canPeek .&&. buffer.notFull) do xs.consume buffer.enq (xs.peek + 1)

-- Convert buffer to a stream return (toStream buffer)

=== Modular compilation

So far we've seen examples of top-level modules, i.e. modules with no inputs or outputs, being converted to Verilog. In fact, any Blarney function whose inputs and outputs are members of the http://blarney-lang.github.io/blarney/Blarney-Core-Interface.html[Interface] class can be converted to Verilog (and the Interface class supports generic deriving). To illustrate, we can convert the function inc (defined <<sources-sinks-streams, above>>) into a Verilog module as follows.

[source, haskell]

main :: IO () main = writeVerilogModule inc "inc" "/tmp/inc"

The generated Verilog module /tmp/inc/inc.v has the following interface:

[source, systemverilog]

module inc( input wire clock , input wire reset , output wire [0:0] in0_consume_en , input wire [0:0] in0_canPeek , input wire [7:0] in0_peek , input wire [0:0] out_consume_en , output wire [7:0] out_peek , output wire [0:0] out_canPeek );

Considering the definition of the Stream type, the correspondance between the Blarney and the Verilog is quite clear:

[cols="1,3", options="header"] |=== |Signal |Description

|in0_consume_en |Output asserted whenever the module consumes an element from the input stream.

|in0_canPeek |Input signalling when there is data available in the input stream.

|in0_peek |Input containing the next value in the input stream.

|out_canPeek |Output asserted whenever there is data available in the output stream.

|out_peek |Output containing the next value in the output stream.

|out_consume_en |Input signalling when the caller consumes an element from the output stream. |===

It is also possible to instantiate a Verilog module inside a Blarney description. To illustrate, here is a function that creates an instance of the Verilog inc module shown above.

[source, haskell]

-- This function creates an instance of a Verilog module called "inc" makeInc :: Stream (Bit 8) -> Module (Stream (Bit 8)) makeInc = makeInstance "inc"

Notice that interface of the Verilog module being instantiated is determined from the type signature. Here's a sample top-level module that uses the makeInc function:

[source, haskell]

top :: Module () top = do -- Counter count :: Reg (Bit 8) <- makeReg 0

-- Input buffer buffer <- makeQueue

-- Create an instance of inc out <- makeInc (toStream buffer)

always do -- Fill input when buffer.notFull do buffer.enq count.val count <== count.val + 1

-- Consume
when out.canPeek do
  out.consume
  display "Got " out.peek
  when (out.peek .==. 100) finish

Using the following main function we can generate both the inc module and a top-level module that instantiates it.

[source, haskell]

main :: IO () main = do let dir = "/tmp/inc" writeVerilogModule inc "inc" dir writeVerilogTop top "top" dir

Using this approach, we can maintain the module hierarchy of a Blarney design whenever we generate Verilog, rather than having to flatten it to big monolithic netlist. This technique can also be used to instantiate any Verilog module within a Blarney design.

When simply marking netlist boundaries within a Blarney design, the makeInstance/writeVerilogModule combination is rather low-level and error-prone. In particular, there is no requirement for the type of the instance to match the type of the module, and it would be nice to specify a boundary in a backend-independent way. To solve these problems, Blarney provides a makeBoundary function. We can now define makeInc as:

[source, haskell]

makeInc :: Stream (Bit 8) -> Module (Stream (Bit 8)) makeInc = makeBoundary "inc" inc

Unlike makeInstance, makeBoundary takes the module to instantiate as an argument. The type of the argument to makeBoundary must match the return type:

[source, haskell]

makeBoundary :: Modular m => String -> m -> m

This means that it is unncessary to supply a type signature for makeInc now; it will be inferred. Furthermore, the top-level of our design no longer needs to call writeVerilogModule for the inc module because Blarney now knows how to generate a module for any instance that it encounters.

=== Master-slave pattern

This is a common pattern in hardware design. Suppose we wish to move a multiplier out of a module and into an separate slave module, where the slave takes requests (pairs of 32-bit integers to multiply) and produces responses (32-bit results).

[source, haskell]

type MulReq = (Bit 32, Bit 32) type MulResp = Bit 32

The slave component might be defined as:

[source, haskell]

slave :: Stream MulReq -> Module (Stream MulResp) slave reqs = do resps <- makeQueue

always do when (reqs.canPeek .&&. resps.notFull) do reqs.consume let (a, b) = reqs.peek resps.enq (a * b)

return (toStream resps)

The master component produces requests for the slave, and consumes responses from the slave. In the example below, the master simply asks the slave to multiply 2 by 2, waits for the response, and then terminates the simulation.

[source, haskell]

master :: Stream MulResp -> Module (Stream MulReq) master resps = do reqs <- makeQueue

runStmt do wait reqs.notFull action do reqs.enq (2, 2) wait resps.canPeek action do resps.consume display "Result: " resps.peek finish

return (toStream reqs)

The top-level module which connects the master and the slave needs to introduce a cycle, which can be achieved simply using Haskell's recursive-do (mdo) notation:

[source, haskell]

top :: Module () top = mdo resps <- slave reqs reqs <- master resps return ()

=== Tagged unions

[#tagged-unions]

Sum types such as

[source, haskell]

data Either a b = Left a | Right b

do not permit generic deriving for the Bits class, so cannot be used for circuit-time values. (An elaboration-time value cannot be influenced by a circuit-time value, making the definition of unpack problematic for sum types, at least without resorting to language plugins). However, Blarney does support tagged unions, allowing the following definition.

[source, haskell]

import Blarney.TaggedUnion

type Either a b = TaggedUnion [ "left" ::: a , "right" ::: b ]

The API for tagged unions is illustrated by the sample functions below.

[source, haskell]

makeLeft :: Bits a => a -> Either a b makeLeft x = tag #left x

isLeft :: Either a b -> Bit 1 isLeft x = x is #left

isRight :: Either a b -> Bit 1 isRight x = x is #right

getLeft :: Bits a => Either a b -> a getLeft x = untag #left x

getLeftOrZero :: Bits a => Either a b -> a getLeftOrZero x = untagDefault #left zero x

exampleAction :: Action () exampleAction = do let foo :: Either (Bit 2) (Bit 4) = tag #right 15 whenTagged #right foo \r -> do display "Right val: " r

=== Bit selection and lookup

Bit selection operators are used to extract a subset of bits out of a bit-vector. There are different flavours, depending on whether the indices are type-level numbers, elaboration-time numbers, or circuit-level numbers.

For type-level indices, we provide functions http://blarney-lang.github.io/blarney/Blarney-Core-Bit.html#v:at[at] and http://blarney-lang.github.io/blarney/Blarney-Core-Bit.html#v:slice[slice], and use type application to specify the type-level indices:

[source, haskell]

-- Extract most-sigificant bit of a byte msb :: Bit 8 -> Bit 1 msb x = at @7 x

-- Extract upper 4 bits of a byte upperNibble :: Bit 8 -> Bit 4 upperNibble x = slice @7 @4 x

For elaboration-time indices of type Int, we provide http://blarney-lang.github.io/blarney/Blarney-Core-Bit.html#v:unsafeAt[unsafeAt] and http://blarney-lang.github.io/blarney/Blarney-Core-Bit.html#v:unsafeSlice[unsafeSlice]:

[source, haskell]

-- Extract most-sigificant bit of a byte msb :: Bit 8 -> Bit 1 msb x = unsafeAt 7 x

-- Extract upper 4 bits of a byte upperNibble :: Bit 8 -> Bit 4 upperNibble x = unsafeSlice (7, 4) x

The argument to unsafeAt could be out of range, and the result of unsafeSlice could have a different width to that implied by the range. Such cases will lead to confusing error messages, hence the "unsafe" prefix on the function names.

Finally, for circuit-level indicies of type Bit n, the generic lookup operator ! can be used:

[source, haskell]

-- Extract bit from byte at given index getBit :: Bit 8 -> Bit 3 -> Bit 1 getBit x i = x!i

Blarney's generic lookup operator x!i returns the element of x at index i, and works for many different types of x and i. See <<Lookup>> for more details.

=== Bit-string pattern matching

Recent work on specifying and implementing ISAs led us to develop two libraries for doing bit-string pattern matching. The first, http://blarney-lang.github.io/blarney/Blarney-BitPat.html[BitPat], is statically-typed and based on the paper https://core.ac.uk/download/pdf/50525461.pdf[Type-safe pattern combinators]. The second, http://blarney-lang.github.io/blarney/Blarney-BitScan.html[BitScan], is dynamically typed but more expressive. As an example, BitScan, let's us define the following instruction decoder for a tiny subset of RISC-V.

[source, haskell]

import Blarney.BitScan

-- Semantics of add instruction add :: Bit 5 -> Bit 5 -> Bit 5 -> Action () add rs2 rs1 rd = display "add r" rd ", r" rs1 ", r" rs2

-- Semantics of addi instruction addi :: Bit 12 -> Bit 5 -> Bit 5 -> Action () addi imm rs1 rd = display "addi r" rd ", r" rs1 ", " imm

-- Semantics of store instruciton sw :: Bit 12 -> Bit 5 -> Bit 5 -> Action () sw imm rs2 rs1 = display "sw r" rs2 ", " imm "(r" rs1 ")"

top :: Module () top = always do -- Sample RISC-V store-word instruction let instr :: Bit 32 = 0b1000000_00001_00010_010_00001_0100011

-- Dispatch match instr [ "0000000 rs2[4:0] rs1[4:0] 000 rd[4:0] 0110011" ==> add, " imm[11:0] rs1[4:0] 000 rd[4:0] 0010011" ==> addi, "imm[11:5] rs2[4:0] rs1[4:0] 010 imm[4:0] 0100011" ==> sw ]

finish

The nice thing about this decoder is that the scattered immediate field imm in the sw instruction is automatically assembled by the library. That is, the imm[11:5] part of the immediate is combined with the imm[4:0] part to give the final 12-bit immediate value passed to the right-hand-side function. Scattered immediates appear a lot in the RISC-V specification. Thanks to Jon Woodruff for suggesting this feature! For a fuller example of the BitScan module, see the Pebbles RV32I https://github.com/blarney-lang/pebbles/blob/master/src/Pebbles/Instructions/RV32_I.hs[instruction decoder].

=== CPUs

A few processor cores have been implemented in Blarney:

https://github.com/blarney-lang/blarney/blob/master/Examples/CPU/CPU.hs[Simple]: 4-stage 8-bit CPU, with just 4 instructions, for learning.
https://github.com/blarney-lang/pebbles/[Pebbles]: RISC-V CPU+GPU using plugable pipelines.
https://github.com/blarney-lang/actora/[Actora]: 3-stage stack machine that runs code written a subset of Erlang.

=== Namer plugin

One of the classic limitations of Lava is that identifier names are lost when the netlist is generated. In particular, this is problematic when we want to analyse, say, the critical-path of our circuit using a third-party tool, but there is no way to map the netlist names reported by the tool back to the Lava names in the original description.

Blarney provides a solution to this problem in the form of the https://github.com/blarney-lang/blarney/blob/master/Haskell/BlarneyPlugins/Namer[Namer plugin]. This is a simple GHC plugin (around 150 lines of code) that looks for monadic bindings of the form

[source, haskell]

x <- m

where m has type Module a for any a, and automatically rewrites the binding as

[source, haskell]

x <- withName "x" m

where http://blarney-lang.github.io/blarney/Blarney-Core-Module.html#v:withName[withName] is a Blarney primitive that introduces name information inside m This simple approach captures quite a lot of useful names.

The plugin is completely optional, and disabled by default. To enable it, first install using cabal

[source, shell]

cd Haskell/BlarneyPlugins/Namer cabal v1-install

and then pass the --enable-namer-plugin flag to blc.

To further improve the readability of generated code, we can also pass the --enable-name-prop and --enable-simplifier options to our circuit generator. This will enable the (experimental) name propagation and netlist simplification passes respectively.

== Type class overview

=== Bits

Any type in the http://blarney-lang.github.io/blarney/Blarney-Core-Bits.html[Bits] class can be represented in hardware, e.g. stored in a wire, a register, or a RAM.

[source, haskell]

class Bits a where type SizeOf a :: Nat sizeOf :: a -> Int pack :: a -> Bit (SizeOf a) unpack :: Bit (SizeOf a) -> a

The Bits class supports generic deriving. For example, suppose we have a simple data type for memory requests:

[source, haskell]

data MemReq = MemReq { memOp :: Bit 1 -- Is it a load or a store request? , memAddr :: Bit 32 -- 32-bit address , memData :: Bit 32 -- 32-bit data for stores } deriving (Generic, Bits)

To make this type a member of the Bits class, we have suffixed it with derving (Generic, Bits). The generic deriving mechanism for Bits does not support sum types: there is no way to convert a bit-vector (run-time circuit value) to a sum type (elaboration-time value) using the circuit primitives provided by Blarney (however, see <<tagged-unions, tagged unions>>).

=== Interface

Any type in the http://blarney-lang.github.io/blarney/Blarney-Core-Interface.html[Interface] class can be used as a module input or output when doing <<modular-compilation, modular compilation>>. Furthermore, collections of interfaces can be indexed by circuit-time values using the ! operator. To illustrate, here is an example circuit to split a stream of <<bits, MemReq>> into four streams, using the lower two bits of the address to decide which output stream to use.

[source, haskell]

split :: Stream MemReq -> Module [Stream MemReq] split reqs = do -- Create a list of 4 queues queues :: [Queue MemReq] <- replicateM 4 makeQueue

always do -- Consume request, and put into appropriate queue when reqs.canPeek do let i :: Bit 2 = truncate reqs.peek.memAddr when (queues!i).notFull do reqs.consume (queues!i).enq reqs.peek

return (map toStream queues)

The Interface class supports generic deriving: just add Interface to the deriving clause for the datatype. In the above example, MemReq is an Interface, and so too is Queue a for any a that is also an Interface.

=== Lookup

The generic lookup operator ! is provided by the http://blarney-lang.github.io/blarney/Blarney-Core-Lookup.html[Lookup] class.

[source, haskell]

-- Index a collection 'c' of elements 'e' using index 'i' class Lookup c i e | c -> e where (!) :: c -> i -> e

A wide range of combinations of types are supported. The functional dependency c -> e allows the return type to be inferred from the collection type.

=== FShow

Any value whose type is in the http://blarney-lang.github.io/blarney/Blarney-Core-FShow.html[FShow] class, or any value of type Format, can be passed as arguments to the variadic display function.

[source, haskell]

class FShow a where fshow :: a -> Format fshowList :: [a] -> Format -- Has default definition

-- Abstract data type for things that can be displayed newtype Format

-- Format constructors mempty :: Format -- Empty (from Monoid class) (<>) :: Format -> Format -> Format -- Append (from Monoid class)

As an example, here is how the FShow instance for pairs is defined.

[source, haskell]

-- Example instance: displaying pairs instance (FShow a, FShow b) => FShow (a, b) where fshow (a, b) = fshow "(" <> fshow a <> fshow "," <> fshow b <> fshow ")"

The FShow class supports generic deriving.

The radix and padding used to display a bit vector can be specified using the following functions.

[source, haskell]

-- Display bit vector in binary with given amount of zero padding formatBin :: Int -> Bit n -> Format

-- Display bit vector in decimal with given amount of zero padding formatDec :: Int -> Bit n -> Format

-- Display bit vector in hex with given amount of zero padding formatHex :: Int -> Bit n -> Format

The FShow instance for Bit n uses decimal format with no padding.

=== Cmp

The Cmp (comparator) class provides a range of familiar comparison operators, and supports generic deriving.

[source, Haskell]

class Cmp a where (.<.) :: a -> a -> Bit 1 (.<=.) :: a -> a -> Bit 1 (.==.) :: a -> a -> Bit 1 (.>.) :: a -> a -> Bit 1 (.>=.) :: a -> a -> Bit 1 (.!=.) :: a -> a -> Bit 1

Only the first three operators must be defined; the others have default definitions.

=== Assign

The assignment operator is overloaded.

[source, Haskell]

class Assign v where (<==) :: Bits a => v a -> a -> Action ()

Example instances are http://blarney-lang.github.io/blarney/Blarney-Core-Module.html#t:Reg[Reg], http://blarney-lang.github.io/blarney/Blarney-Core-Module.html#t:Wire[Wire], and http://blarney-lang.github.io/blarney/Blarney-Core-Module.html#t:WriteOnly[WriteOnly].

=== ToSource and ToSink

Converting interfaces to http://blarney-lang.github.io/blarney/Blarney-SourceSink.html[sources and sinks] may turn out to be common. For example, http://blarney-lang.github.io/blarney/Blarney-Queue.html[Queue] and http://blarney-lang.github.io/blarney/Blarney-Stack.html[Stack] are both sources and sinks. Therefore the following type classes are provided.

[source, Haskell]

-- Convert to a source class ToSource a b | a -> b where toSource :: a -> Source b

-- Convert to a sink class ToSink a b | a -> b where toSink :: a -> Sink b

blarney blarney copied to clipboard

Metadata

[source, shell]

$ sudo apt install verilator

[source, shell]

$ git clone --recursive https://github.com/blarney-lang/blarney $ export BLARNEY_ROOT=$(pwd)/blarney $ export PATH=$PATH:$BLARNEY_ROOT/Scripts

[source, shell]

$ cd Examples/Sorter $ make $ ./Sorter --simulate

sort [3,4,1,0,2] = [0,1,2,3,4]

[source, shell]

$ ./Sorter # Run the Verilog generator $ cd Sorter-Verilog $ make # Compile the generated Verilog using Verilator $ ./Sorter # Simulate the generated Verilog

[source, shell]

$ cd Test $ ./test.sh --run-all

[source, haskell]

twoSort :: (Bit 8, Bit 8) -> (Bit 8, Bit 8) twoSort (a, b) = a .<. b ? ((a, b), (b, a))

[source, haskell]

top :: Module () top = always do display "twoSort (1,2) = " (twoSort (1,2)) display "twoSort (2,1) = " (twoSort (2,1)) finish

[source, haskell]

main :: IO () main = writeVerilogTop top "top" "/tmp/twoSort/"

[source, shell]

$ blc Sorter.hs

[source, shell]

$ cd /tmp/twoSort $ make $ ./top twoSort (1,2) = (1,2) twoSort (2,1) = (1,2)

[source, haskell]

main :: IO () main = simulate top

[source, shell]

$ ./Sorter twoSort (1,2) = (1,2) twoSort (2,1) = (1,2)

[source, haskell]

bubble :: [Bit 8] -> [Bit 8] bubble [] = [] bubble [x] = [x] bubble (x:y:rest) = bubble (small:rest) ++ [large] where (small, large) = twoSort (x, y)

[source, haskell]

sort :: [Bit 8] -> [Bit 8] sort [] = [] sort xs = smallest : sort rest where smallest:rest = bubble xs

[source, haskell]

top :: Module () top = always do let inputs = [3, 4, 1, 0, 2] display "sort " inputs " = " (sort inputs) finish

sort [3,4,1,0,2] = [0,1,2,3,4]

[source, haskell]

(.<.) :: Cmp a => a -> a -> Bit 1 (?) :: Bits a => Bit 1 -> (a, a) -> a

[source, haskell]

twoSort :: (Bits a, Cmp a) => (a, a) -> (a, a) twoSort (a, b) = a .<. b ? ((a, b), (b, a))

[source, haskell]

twoSort :: (Bits a, Cmp a) => (a, a) -> (a, a) twoSort (a, b) = if a .<. b then (a, b) else (b, a)

[source, haskell]

[source, haskell]

-- Terminate simulation when count reaches 10 if cycleCount.val .==. 10 then do display "Finished" finish else display "Not finished"

cycleCount = 0 cycleCount = 1 cycleCount = 2 cycleCount = 3 cycleCount = 4 cycleCount = 5 cycleCount = 6 cycleCount = 7 cycleCount = 8 cycleCount = 9 cycleCount = 10 Finished

[source, haskell]

[source, haskell]

-- Methods return Queue { notFull = full.val .==. 0 , notEmpty = full.val .==. 1 , enq = \a -> do reg <== a full <== 1 , deq = full <== 0 , canDeq = full.val .==. 1 , first = reg.val }

[source, haskell]

[source, haskell]

-- Interface for a n-bit counter data Counter n = Counter { inc :: Action () , dec :: Action () , output :: Bit n }

[source, haskell]

-- Interface return Counter { inc = do incWire <== 1 dec = do decWire <== 1 output = count.val }

[source, haskell]

[source, haskell]

runRecipe recipe

[source, haskell]

runRecipe test

[source, haskell]

runStmt stmt

[source, haskell]

-- Block RAM interface -- (Parameterised by the address width a and the data width d) data RAM a d = RAM { load :: a -> Action () , store :: a -> d -> Action () , out :: d }

[source, haskell]

-- Write 10 to ram[0] and read it back again runStmt do action do store ram 0 10 action do load ram 0 action do display "Got " ram.out finish

[source, haskell]

data RegFile a d = RegFile { index :: a -> d -- Read , update :: a -> d -> Action() -- Write }

[source, haskell]

-- A stream is another name for a source (discussed below) type Stream t = Source t

[source, haskell]

-- Another name for toSource (discussed below) toStream :: ToSource a b => a -> Stream b toStream = toSource

[source, haskell]

-- Convert buffer to a stream return (toStream buffer)

[source, haskell]

main :: IO () main = writeVerilogModule inc "inc" "/tmp/inc"

[source, systemverilog]

module inc( input wire clock , input wire reset , output wire [0:0] in0_consume_en , input wire [0:0] in0_canPeek , input wire [7:0] in0_peek , input wire [0:0] out_consume_en , output wire [7:0] out_peek , output wire [0:0] out_canPeek );

[source, haskell]

-- This function creates an instance of a Verilog module called "inc" makeInc :: Stream (Bit 8) -> Module (Stream (Bit 8)) makeInc = makeInstance "inc"

[source, haskell]

[source, haskell]

main :: IO () main = do let dir = "/tmp/inc" writeVerilogModule inc "inc" dir writeVerilogTop top "top" dir

blarney
blarney copied to clipboard