clash-compiler Bad performance when pattern-matching bit by bit

Please find a full standalone compilable code at the end of this post.

I am experiencing very bad scaling from the CLaSH synthesizer when it is compiling the instruction fetch/decode function for an Intel 8080. If I try to compile my code as-is, it just eats up all memory in a couple minutes. However, if I start commenting out most branches of fetchInstr, then it can finish quite quickly; for example, with just the following branches remaining, synthesis takes a mere 15 seconds:

fetchInstr :: (Monad m) => m (Unsigned 8) -> m Instr
fetchInstr fetch = do
    b1 <- fetch
    let b1'@(_ :> _ :> d2@r :> d1@p :> d0 :> s2 :> s1 :> s0 :> Nil) = bitCoerce b1 :: Vec 8 Bit
        d = decodeOp $ bitCoerce (d2, d1, d0)
        s = decodeOp $ bitCoerce (s2, s1, s0)
        rp = decodeRP r p
        rppp = decodeRPPP r p
        cond = decodeCond (d2 :> d1 :> d0 :> Nil)
    case b1' of
        0 :> 1 :> _ :> _ :> _ :> _ :> _ :> _ :> Nil -> return $ MOV d (Op s)
        0 :> 0 :> _ :> _ :> _ :> 1 :> 1 :> 0 :> Nil -> MOV d <$> Imm <$> fetch
        0 :> 0 :> _ :> _ :> 0 :> 0 :> 0 :> 1 :> Nil -> LXI rp <$> fetch16
        0 :> 1 :> 1 :> 1 :> 0 :> 1 :> 1 :> 0 :> Nil -> return HLT
        0 :> 0 :> 0 :> 0 :> 0 :> 0 :> 0 :> 0 :> Nil -> return NOP
        _ -> return NOP -- "Unofficial" NOP
  where
    fetch16 = do
        lo <- fetch
        hi <- fetch
        return $ bitCoerce (hi, lo)

But as I keep re-adding more, the synthesis speed goes down and memory usage goes up quickly. For an intermediate example, this takes 56 seconds to synthesize:

fetchInstr :: (Monad m) => m (Unsigned 8) -> m Instr
fetchInstr fetch = do
    b1 <- fetch
    let b1'@(_ :> _ :> d2@r :> d1@p :> d0 :> s2 :> s1 :> s0 :> Nil) = bitCoerce b1 :: Vec 8 Bit
        d = decodeOp $ bitCoerce (d2, d1, d0)
        s = decodeOp $ bitCoerce (s2, s1, s0)
        rp = decodeRP r p
        rppp = decodeRPPP r p
        cond = decodeCond (d2 :> d1 :> d0 :> Nil)
    case b1' of
        0 :> 1 :> _ :> _ :> _ :> _ :> _ :> _ :> Nil -> return $ MOV d (Op s)
        0 :> 0 :> _ :> _ :> _ :> 1 :> 1 :> 0 :> Nil -> MOV d <$> Imm <$> fetch
        0 :> 0 :> _ :> _ :> 0 :> 0 :> 0 :> 1 :> Nil -> LXI rp <$> fetch16

        0 :> 0 :> 1 :> 1 :> 1 :> 0 :> 1 :> 0 :> Nil -> LDA <$> fetch16
        0 :> 0 :> 1 :> 1 :> 0 :> 0 :> 1 :> 0 :> Nil -> STA <$> fetch16
        0 :> 0 :> 1 :> 0 :> 1 :> 0 :> 1 :> 0 :> Nil -> LHLD <$> fetch16
        0 :> 0 :> 1 :> 0 :> 0 :> 0 :> 1 :> 0 :> Nil -> SHLD <$> fetch16
        0 :> 0 :> _ :> _ :> 1 :> 0 :> 1 :> 0 :> Nil -> return $ LDAX rp
        0 :> 0 :> _ :> _ :> 0 :> 0 :> 1 :> 0 :> Nil -> return $ STAX rp
        1 :> 1 :> 1 :> 0 :> 1 :> 0 :> 1 :> 1 :> Nil -> return XCHG

        1 :> 0 :> 0 :> 0 :> 0 :> _ :> _ :> _ :> Nil -> return $ ALU ADD (Op s)
        1 :> 1 :> 0 :> 0 :> 0 :> 1 :> 1 :> 0 :> Nil -> ALU ADD <$> Imm <$> fetch
        1 :> 0 :> 0 :> 0 :> 1 :> _ :> _ :> _ :> Nil -> return $ ALU ADC (Op s)
        1 :> 1 :> 0 :> 0 :> 1 :> 1 :> 1 :> 0 :> Nil -> ALU ADC <$> Imm <$> fetch

        0 :> 1 :> 1 :> 1 :> 0 :> 1 :> 1 :> 0 :> Nil -> return HLT
        0 :> 0 :> 0 :> 0 :> 0 :> 0 :> 0 :> 0 :> Nil -> return NOP
        _ -> return NOP -- "Unofficial" NOP

Full code with no external dependencies:

{-# LANGUAGE RecordWildCards, ApplicativeDo #-}
{-# LANGUAGE GeneralizedNewtypeDeriving, DerivingStrategies #-}
module Bug where

import Clash.Prelude hiding (lift)
import Control.Monad.Trans.Except
import Control.Monad.Reader
import Control.Monad.State
import Data.Maybe (fromMaybe)
import Data.Monoid
import Control.Monad.RWS

mealyState :: (HiddenClockReset domain gated synchronous, Undefined s)
           => (i -> State s o) -> s -> (Signal domain i -> Signal domain o)
mealyState f = mealy step
  where
    step s x = let (y, s') = runState (f x) s
               in (s', y)

type NumRegs = 8
type Reg = Index NumRegs

data Failure
    = Underrun
    | Overrun
    deriving Show

data Buffer n dat = Buffer
    { bufferContents :: Vec n dat
    , bufferNext :: Index (1 + n)
    }
    deriving (Show, Generic, Undefined)

instance (KnownNat n, Default dat) => Default (Buffer n dat) where
    def = Buffer (pure def) 0

remember :: (KnownNat n) => Buffer n dat -> dat -> Buffer n dat
remember Buffer{..} x = Buffer
    { bufferContents = replace bufferNext x bufferContents
    , bufferNext = bufferNext + 1
    }

newtype FetchM n dat m a = FetchM{ unFetchM :: ReaderT (Buffer n dat) (StateT (Index (1 + n)) (ExceptT Failure m)) a }
    deriving newtype (Functor, Applicative, Monad)

runFetchM :: (Monad m, KnownNat n) => Buffer n dat -> FetchM n dat m a -> m (Either Failure a)
runFetchM buf act = runExceptT $ evalStateT (runReaderT (unFetchM act) buf) 0

fetch :: (Monad m, KnownNat n) => FetchM n dat m dat
fetch = do
    Buffer{..} <- FetchM ask
    idx <- FetchM get
    when (idx == maxBound) overrun
    when (idx >= bufferNext) underrun
    FetchM $ put $ idx + 1
    return $ bufferContents !! idx
  where
    overrun = FetchM . lift . lift . throwE $ Overrun
    underrun = FetchM . lift . lift . throwE $ Underrun

rA, rFlags, rB, rC, rD, rE, rH, rL :: Reg
rA = 7
rFlags = 6
rB = 0
rC = 1
rD = 2
rE = 3
rH = 4
rL = 5

data RegPair = Regs Reg Reg | SP
    deriving (Eq, Ord, Show, Generic, Undefined)

rBC, rDE, rHL :: RegPair
rBC = Regs rB rC
rDE = Regs rD rE
rHL = Regs rH rL

type Flag = Index 8

fS, fZ, fA, fP, fC :: Flag
fS = 7
fZ = 6
fA = 4
fP = 2
fC = 0

data Op
    = Reg Reg
    | AddrHL
    deriving (Eq, Ord, Show, Generic, Undefined)

type Value = Unsigned 8
type Addr = Unsigned 16
type Port = Unsigned 8

data Src
    = Op Op
    | Imm Value
    deriving (Eq, Ord, Show, Generic, Undefined)

data ALU = ADD | ADC | SUB | SBB | AND | OR | XOR | CMP
    deriving (Eq, Ord, Show, Enum, Bounded, Generic, Undefined)

data Cond = Cond Flag Bool
    deriving (Eq, Ord, Show, Generic, Undefined)

data Instr
    = MOV Op Src
    | LXI RegPair (Unsigned 16)
    | LDA Addr
    | STA Addr
    | LHLD Addr
    | SHLD Addr
    | LDAX RegPair
    | STAX RegPair
    | XCHG
    | ALU ALU Src
    | INR Op
    | DCR Op
    | INX RegPair
    | DCX RegPair
    | DAD RegPair
    | DAA
    | RLC
    | RRC
    | RAL
    | RAR
    | CMA
    | CMC
    | STC
    | JMP Addr
    | JMPIf Cond Addr
    | CALL Addr
    | CALLIf Cond Addr
    | RET
    | RETIf Cond
    | RST (Unsigned 3)
    | PCHL
    | PUSH RegPair
    | POP RegPair
    | XTHL
    | SPHL
    | IN Port
    | OUT Port
    | INT Bool
    | HLT
    | NOP
    deriving (Eq, Ord, Show, Generic, Undefined)

fetchInstr :: (Monad m) => m (Unsigned 8) -> m Instr
fetchInstr fetch = do
    b1 <- fetch
    let b1'@(_ :> _ :> d2@r :> d1@p :> d0 :> s2 :> s1 :> s0 :> Nil) = bitCoerce b1 :: Vec 8 Bit
        d = decodeOp $ bitCoerce (d2, d1, d0)
        s = decodeOp $ bitCoerce (s2, s1, s0)
        rp = decodeRP r p
        rppp = decodeRPPP r p
        cond = decodeCond (d2 :> d1 :> d0 :> Nil)
    case b1' of
        0 :> 1 :> _ :> _ :> _ :> _ :> _ :> _ :> Nil -> return $ MOV d (Op s)
        0 :> 0 :> _ :> _ :> _ :> 1 :> 1 :> 0 :> Nil -> MOV d <$> Imm <$> fetch
        0 :> 0 :> _ :> _ :> 0 :> 0 :> 0 :> 1 :> Nil -> LXI rp <$> fetch16

        0 :> 0 :> 1 :> 1 :> 1 :> 0 :> 1 :> 0 :> Nil -> LDA <$> fetch16
        0 :> 0 :> 1 :> 1 :> 0 :> 0 :> 1 :> 0 :> Nil -> STA <$> fetch16
        0 :> 0 :> 1 :> 0 :> 1 :> 0 :> 1 :> 0 :> Nil -> LHLD <$> fetch16
        0 :> 0 :> 1 :> 0 :> 0 :> 0 :> 1 :> 0 :> Nil -> SHLD <$> fetch16
        0 :> 0 :> _ :> _ :> 1 :> 0 :> 1 :> 0 :> Nil -> return $ LDAX rp
        0 :> 0 :> _ :> _ :> 0 :> 0 :> 1 :> 0 :> Nil -> return $ STAX rp
        1 :> 1 :> 1 :> 0 :> 1 :> 0 :> 1 :> 1 :> Nil -> return XCHG

        1 :> 0 :> 0 :> 0 :> 0 :> _ :> _ :> _ :> Nil -> return $ ALU ADD (Op s)
        1 :> 1 :> 0 :> 0 :> 0 :> 1 :> 1 :> 0 :> Nil -> ALU ADD <$> Imm <$> fetch
        1 :> 0 :> 0 :> 0 :> 1 :> _ :> _ :> _ :> Nil -> return $ ALU ADC (Op s)
        1 :> 1 :> 0 :> 0 :> 1 :> 1 :> 1 :> 0 :> Nil -> ALU ADC <$> Imm <$> fetch

        1 :> 0 :> 0 :> 1 :> 0 :> _ :> _ :> _ :> Nil -> return $ ALU SUB (Op s)
        1 :> 1 :> 0 :> 1 :> 0 :> 1 :> 1 :> 0 :> Nil -> ALU SUB <$> Imm <$> fetch
        1 :> 0 :> 0 :> 1 :> 1 :> _ :> _ :> _ :> Nil -> return $ ALU SBB (Op s)
        1 :> 1 :> 0 :> 1 :> 1 :> 1 :> 1 :> 0 :> Nil -> ALU SBB <$> Imm <$> fetch

        1 :> 0 :> 1 :> 0 :> 0 :> _ :> _ :> _ :> Nil -> return $ ALU AND (Op s)
        1 :> 1 :> 1 :> 0 :> 0 :> 1 :> 1 :> 0 :> Nil -> ALU AND <$> Imm <$> fetch

        1 :> 0 :> 1 :> 1 :> 0 :> _ :> _ :> _ :> Nil -> return $ ALU OR (Op s)
        1 :> 1 :> 1 :> 1 :> 0 :> 1 :> 1 :> 0 :> Nil -> ALU OR <$> Imm <$> fetch

        1 :> 0 :> 1 :> 0 :> 1 :> _ :> _ :> _ :> Nil -> return $ ALU XOR (Op s)
        1 :> 1 :> 1 :> 0 :> 1 :> 1 :> 1 :> 0 :> Nil -> ALU XOR <$> Imm <$> fetch

        1 :> 0 :> 1 :> 1 :> 1 :> _ :> _ :> _ :> Nil -> return $ ALU CMP (Op s)
        1 :> 1 :> 1 :> 1 :> 1 :> 1 :> 1 :> 0 :> Nil -> ALU CMP <$> Imm <$> fetch

        0 :> 0 :> _ :> _ :> _ :> 1 :> 0 :> 0 :> Nil -> return $ INR d
        0 :> 0 :> _ :> _ :> _ :> 1 :> 0 :> 1 :> Nil -> return $ DCR d
        0 :> 0 :> _ :> _ :> 0 :> 0 :> 1 :> 1 :> Nil -> return $ INX rp
        0 :> 0 :> _ :> _ :> 1 :> 0 :> 1 :> 1 :> Nil -> return $ DCX rp

        0 :> 0 :> _ :> _ :> 1 :> 0 :> 0 :> 1 :> Nil -> return $ DAD rp
        0 :> 0 :> 1 :> 0 :> 0 :> 1 :> 1 :> 1 :> Nil -> return DAA

        0 :> 0 :> 0 :> 0 :> 0 :> 1 :> 1 :> 1 :> Nil -> return RLC
        0 :> 0 :> 0 :> 0 :> 1 :> 1 :> 1 :> 1 :> Nil -> return RRC
        0 :> 0 :> 0 :> 1 :> 0 :> 1 :> 1 :> 1 :> Nil -> return RAL
        0 :> 0 :> 0 :> 1 :> 1 :> 1 :> 1 :> 1 :> Nil -> return RAR

        0 :> 0 :> 1 :> 0 :> 1 :> 1 :> 1 :> 1 :> Nil -> return CMA
        0 :> 0 :> 1 :> 1 :> 1 :> 1 :> 1 :> 1 :> Nil -> return CMC
        0 :> 0 :> 1 :> 1 :> 0 :> 1 :> 1 :> 1 :> Nil -> return STC

        1 :> 1 :> 0 :> 0 :> 0 :> 0 :> 1 :> 1 :> Nil -> JMP <$> fetch16
        1 :> 1 :> _ :> _ :> _ :> 0 :> 1 :> 0 :> Nil -> JMPIf cond <$> fetch16
        1 :> 1 :> 0 :> 0 :> 1 :> 1 :> 0 :> 1 :> Nil -> CALL <$> fetch16
        1 :> 1 :> _ :> _ :> _ :> 1 :> 0 :> 0 :> Nil -> CALLIf cond <$> fetch16
        1 :> 1 :> 0 :> 0 :> 1 :> 0 :> 0 :> 1 :> Nil -> return RET
        1 :> 1 :> _ :> _ :> _ :> 0 :> 0 :> 0 :> Nil -> return $ RETIf cond

        1 :> 1 :> 0 :> 1 :> 1 :> 0 :> 1 :> 1 :> Nil -> IN <$> fetch
        1 :> 1 :> 0 :> 1 :> 0 :> 0 :> 1 :> 1 :> Nil -> OUT <$> fetch

        1 :> 1 :> 1 :> 0 :> 1 :> 0 :> 0 :> 1 :> Nil -> return PCHL
        1 :> 1 :> _ :> _ :> 0 :> 1 :> 0 :> 1 :> Nil -> return $ PUSH rppp
        1 :> 1 :> _ :> _ :> 0 :> 0 :> 0 :> 1 :> Nil -> return $ POP rppp
        1 :> 1 :> 1 :> 0 :> 0 :> 0 :> 1 :> 1 :> Nil -> return XTHL
        1 :> 1 :> 1 :> 1 :> 1 :> 0 :> 0 :> 1 :> Nil -> return SPHL

        1 :> 1 :> 1 :> 1 :> b :> 0 :> 1 :> 1 :> Nil -> return $ INT $ bitToBool b
        1 :> 1 :> _ :> _ :> _ :> 1 :> 1 :> 1 :> Nil -> return $ RST $ bitCoerce $ d2 :> d1 :> d0 :> Nil
        0 :> 1 :> 1 :> 1 :> 0 :> 1 :> 1 :> 0 :> Nil -> return HLT
        0 :> 0 :> 0 :> 0 :> 0 :> 0 :> 0 :> 0 :> Nil -> return NOP
        _ -> return NOP -- "Unofficial" NOP
  where
    fetch16 = do
        lo <- fetch
        hi <- fetch
        return $ bitCoerce (hi, lo)


newtype CPU i s o a = CPU{ unCPU :: ExceptT () (RWS i (Last o) s) a }
    deriving newtype (Functor, Applicative, Monad, MonadState s)

input :: CPU i s o i
input = CPU ask

output :: o -> CPU i s o ()
output = CPU . tell . pure

abort :: CPU i s o a
abort = CPU $ throwE ()

runCPU :: (s -> o) -> CPU i s o () -> (i -> State s o)
runCPU mkDef cpu inp = do
    s <- get
    let (s', writes) = execRWS (runExceptT $ unCPU cpu) inp s
    put s'
    def <- gets mkDef
    return $ fromMaybe def $ getLast writes

runCPUDebug :: (s -> o) -> CPU i s o () -> (i -> State s (s, o))
runCPUDebug mkDef cpu inp = do
    s0 <- get
    out <- runCPU mkDef cpu inp
    return (s0, out)


data Phase
    = Init
    | Fetching Bool (Buffer 3 Value)
    deriving (Show, Generic, Undefined)

data CPUIn = CPUIn
    { cpuInMem :: Value
    }
    deriving (Show)

data CPUState = CPUState
    { phase, prevPhase :: Phase
    , pc :: Addr
    , instrBuf :: Instr
    }
    deriving (Show, Generic, Undefined)

initState :: CPUState
initState = CPUState
    { phase = Init
    , prevPhase = Init
    , pc = 0x0000
    , instrBuf = NOP
    }

defaultOut :: CPUState -> Addr
defaultOut CPUState{..} = pc

type M = CPU CPUIn CPUState Addr

cpu :: M ()
cpu = do
    CPUIn{..} <- input
    CPUState{..} <- get

    -- trace (printf "%04x: %s" (fromIntegral pc :: Word16) (show phase)) $ return ()
    case phase of
        Init -> goto $ Fetching False def
        Fetching interrupting buf -> do
            buf' <- remember buf <$> do
                unless interrupting $ setPC $ pc + 1
                return cpuInMem
            instr_ <- runFetchM buf' $ fetchInstr fetch
            instr <- case instr_ of
                Left Underrun -> goto (Fetching interrupting buf') >> abort
                Left Overrun -> error "Overrun"
                Right instr -> return instr
            modify $ \s -> s{ instrBuf = instr }
            goto $ Fetching False def
            -- trace (printf "%04x: %s" (fromIntegral pc :: Word16) (show instr)) $ return ()
            exec instr
  where
    exec NOP = return ()
    exec instr = error $ show instr

goto :: Phase -> M ()
goto ph = modify $ \s -> s{ prevPhase = phase s, phase = ph }

getPC :: M Addr
getPC = gets pc

setPC :: Addr -> M ()
setPC pc = modify $ \s -> s{ pc = pc }

decodeOp :: Reg -> Op
decodeOp 6 = AddrHL
decodeOp reg = Reg reg

decodeCond :: Vec 3 Bit -> Cond
decodeCond cond = Cond flag b
  where
    (flag0, b) = unpack (pack cond) :: (Unsigned 2, Bool)
    flag = case flag0 of
        0b00 -> fZ
        0b01 -> fC
        0b10 -> fP
        0b11 -> fS

decodeRP :: Bit -> Bit -> RegPair
decodeRP 0 0 = Regs rB rC
decodeRP 0 1 = Regs rD rE
decodeRP 1 0 = Regs rH rL
decodeRP 1 1 = SP

decodeRPPP :: Bit -> Bit -> RegPair
decodeRPPP 1 1 = Regs rA rFlags
decodeRPPP r p = decodeRP r p

{-# NOINLINE topEntity #-}
{-# ANN topEntity
  (Synthesize
    { t_name   = "SpaceInvaders"
    , t_inputs =
          [ PortName "CLK_25MHZ"
          , PortName "RESET"
          ]
    , t_output = PortName "ADDR"
    }) #-}
topEntity
    :: Clock System Source
    -> Reset System Asynchronous
    -> Signal System Addr
topEntity = exposeClockReset mainBoard

mainBoard
    :: (HiddenClockReset domain gated synchronous)
    => Signal domain Addr
mainBoard = cpuOut
  where
    cpuOut = mealyState (runCPU defaultOut cpu) initState cpuIn

    progROM addr = unpack <$> romFile (SNat @13) "image.hex" (truncateB <$> addr)
    memRead = progROM
    read = memRead cpuOut

    cpuIn = do
        cpuInMem <- read
        pure CPUIn{..}

Jun 07 '19 11:06 gergoerdi

As an additional data point, keeping the following results in synthesis time of 5m34s, memory peaking somewhere around 8G:

fetchInstr :: (Monad m) => m (Unsigned 8) -> m Instr
fetchInstr fetch = do
    b1 <- fetch
    let b1'@(_ :> _ :> d2@r :> d1@p :> d0 :> s2 :> s1 :> s0 :> Nil) = bitCoerce b1 :: Vec 8 Bit
        d = decodeOp $ bitCoerce (d2, d1, d0)
        s = decodeOp $ bitCoerce (s2, s1, s0)
        rp = decodeRP r p
        rppp = decodeRPPP r p
        cond = decodeCond (d2 :> d1 :> d0 :> Nil)
    case b1' of
        0 :> 1 :> _ :> _ :> _ :> _ :> _ :> _ :> Nil -> return $ MOV d (Op s)
        0 :> 0 :> _ :> _ :> _ :> 1 :> 1 :> 0 :> Nil -> MOV d <$> Imm <$> fetch
        0 :> 0 :> _ :> _ :> 0 :> 0 :> 0 :> 1 :> Nil -> LXI rp <$> fetch16

        0 :> 0 :> 1 :> 1 :> 1 :> 0 :> 1 :> 0 :> Nil -> LDA <$> fetch16
        0 :> 0 :> 1 :> 1 :> 0 :> 0 :> 1 :> 0 :> Nil -> STA <$> fetch16
        0 :> 0 :> 1 :> 0 :> 1 :> 0 :> 1 :> 0 :> Nil -> LHLD <$> fetch16
        0 :> 0 :> 1 :> 0 :> 0 :> 0 :> 1 :> 0 :> Nil -> SHLD <$> fetch16
        0 :> 0 :> _ :> _ :> 1 :> 0 :> 1 :> 0 :> Nil -> return $ LDAX rp
        0 :> 0 :> _ :> _ :> 0 :> 0 :> 1 :> 0 :> Nil -> return $ STAX rp
        1 :> 1 :> 1 :> 0 :> 1 :> 0 :> 1 :> 1 :> Nil -> return XCHG

        1 :> 0 :> 0 :> 0 :> 0 :> _ :> _ :> _ :> Nil -> return $ ALU ADD (Op s)
        1 :> 1 :> 0 :> 0 :> 0 :> 1 :> 1 :> 0 :> Nil -> ALU ADD <$> Imm <$> fetch
        1 :> 0 :> 0 :> 0 :> 1 :> _ :> _ :> _ :> Nil -> return $ ALU ADC (Op s)
        1 :> 1 :> 0 :> 0 :> 1 :> 1 :> 1 :> 0 :> Nil -> ALU ADC <$> Imm <$> fetch

        1 :> 0 :> 0 :> 1 :> 0 :> _ :> _ :> _ :> Nil -> return $ ALU SUB (Op s)
        1 :> 1 :> 0 :> 1 :> 0 :> 1 :> 1 :> 0 :> Nil -> ALU SUB <$> Imm <$> fetch
        1 :> 0 :> 0 :> 1 :> 1 :> _ :> _ :> _ :> Nil -> return $ ALU SBB (Op s)
        1 :> 1 :> 0 :> 1 :> 1 :> 1 :> 1 :> 0 :> Nil -> ALU SBB <$> Imm <$> fetch

        1 :> 0 :> 1 :> 0 :> 0 :> _ :> _ :> _ :> Nil -> return $ ALU AND (Op s)
        1 :> 1 :> 1 :> 0 :> 0 :> 1 :> 1 :> 0 :> Nil -> ALU AND <$> Imm <$> fetch

        1 :> 0 :> 1 :> 1 :> 0 :> _ :> _ :> _ :> Nil -> return $ ALU OR (Op s)
        1 :> 1 :> 1 :> 1 :> 0 :> 1 :> 1 :> 0 :> Nil -> ALU OR <$> Imm <$> fetch

        1 :> 0 :> 1 :> 0 :> 1 :> _ :> _ :> _ :> Nil -> return $ ALU XOR (Op s)
        1 :> 1 :> 1 :> 0 :> 1 :> 1 :> 1 :> 0 :> Nil -> ALU XOR <$> Imm <$> fetch

        1 :> 0 :> 1 :> 1 :> 1 :> _ :> _ :> _ :> Nil -> return $ ALU CMP (Op s)
        1 :> 1 :> 1 :> 1 :> 1 :> 1 :> 1 :> 0 :> Nil -> ALU CMP <$> Imm <$> fetch

        0 :> 0 :> _ :> _ :> _ :> 1 :> 0 :> 0 :> Nil -> return $ INR d
        0 :> 0 :> _ :> _ :> _ :> 1 :> 0 :> 1 :> Nil -> return $ DCR d
        0 :> 0 :> _ :> _ :> 0 :> 0 :> 1 :> 1 :> Nil -> return $ INX rp
        0 :> 0 :> _ :> _ :> 1 :> 0 :> 1 :> 1 :> Nil -> return $ DCX rp

        0 :> 0 :> _ :> _ :> 1 :> 0 :> 0 :> 1 :> Nil -> return $ DAD rp
        0 :> 0 :> 1 :> 0 :> 0 :> 1 :> 1 :> 1 :> Nil -> return DAA

        0 :> 1 :> 1 :> 1 :> 0 :> 1 :> 1 :> 0 :> Nil -> return HLT
        0 :> 0 :> 0 :> 0 :> 0 :> 0 :> 0 :> 0 :> Nil -> return NOP
        _ -> return NOP -- "Unofficial" NOP

Jun 07 '19 11:06 gergoerdi

I'm guessing this is due to every :> representing a different (GADT) case. I don't think it'll be trivial to speed this case up. (I won't be able to look at in detail til at least after the weekend, though, so I'm not sure.) Could you try:

{-# LANGUAGE ViewPatterns #-}

[..]

    case b1 of
        --     mask          value
        ((.&.) 0b11000000 -> 0b01000000) -> return $ MOV d (Op s)
        ((.&.) 0b11000111 -> 0b01000110) -> MOV d <$> Imm <$> fetch
[..]

This should be much quicker in simulation, as well as Clash compilation. If that works, I guess you could write a template haskell function as a workaround..

    case b1 of
        --     mask          value
        $(casex "01xxxxxx") -> return $ MOV d (Op s)
        $(casex "01xxx110") -> MOV d <$> Imm <$> fetch
[..]

I'm more than happy to write it for you if the first example actually solves your issues.

Jun 07 '19 12:06 martijnbastiaan

So does that mean it should also work to deconstruct it into a (Bit, Bit, Bit, Bit, Bit, Bit, Bit, Bit) and pattern-match on that? I'd prefer that to manual bitmasking.

Jun 07 '19 13:06 gergoerdi

So a quick experiment shows the 8-tuple of Bits has the same problem as the Vec 8 Bit, even though it's not a GADT. I guess because it is still compiled to nested pattern matches coordinate-by-coordinate.

Your TH idea looks like a pragmatic compromise, so I am going to go with that for now. For reference, my implementation is:

import Clash.Prelude
import Language.Haskell.TH
import Data.List as L
import Data.Maybe (fromMaybe)

fromBits :: [Bit] -> Integer
fromBits = L.foldl (\v b -> v `shiftL` 1 .|. fromIntegral b) 0

bitMask :: String -> Q Pat
bitMask s = [p| (($mask .&.) -> $pattern) |]
  where
    bs = parse <$> s

    mask = litE . IntegerL . fromBits $ maybe 0 (const 1) <$> bs
    pattern = litP . IntegerL . fromBits $ fromMaybe 0 <$> bs

    parse 'x' = Nothing
    parse '_' = Nothing
    parse '0' = Just 0
    parse '1' = Just 1

Jun 07 '19 13:06 gergoerdi

Thanks for posting the snippet @gergoerdi. Embarrassingly, I didn't know you could use Q Pat, so I was hacking on an uglier approach. Could you confirm the workaround works? I'll suggest it for inclusion in the Prelude if it does.

Jun 07 '19 13:06 martijnbastiaan

For my stripped-down test case from this ticket, it seems the bitmask view patterns do indeed work:

GHC+Clash: Loading modules cumulatively took 13.849s
Clash: Parsing and compiling primitives took 0.623s
Clash: Compiling Bug.topEntity
Clash: Applied 1762 transformations
Clash: Normalisation took 11.658s
Clash: Netlist generation took 0.292s
WARNING: file "image.hex" does not exist
Clash: Total compilation took 26.042s

Unfortunately, my original code is still synthesizing: it's been running for 10 minutes now, but its memory usage is steady at just below 4 GB so I'll keep it running for a bit.

I'm happy to prepare a pull request for clash-prelude with the bitMask macro; would Clash.Prelude.TH.BitMask work for the name? Or Clash.Prelude.BitMask.TH? Or just Clash.Prelude.BitMask?

Jun 07 '19 13:06 gergoerdi

GHC+Clash: Loading modules cumulatively took 24.404s
Clash: Parsing and compiling primitives took 0.647s
Clash: Compiling SpaceInvaders.topEntity
Clash: Applied 74909 transformations
Clash: Normalisation took 15m40.465s
Clash: Netlist generation took 5.340s
Clash: Total compilation took 16m15.785s
Build completed in 16m16s

Jun 07 '19 14:06 gergoerdi

I'd be fine with having it in Clash.Sized.Internal.BitVector and exporting it in Clash.Sized.BitVector, Clash.Prelude, and Clash.Explicit.Prelude. To keep it consistent with bLit, we'd need to accept . instead of _/x. I'm wondering if there's a more obvious name than bitMask..

Jun 07 '19 14:06 martijnbastiaan

This module might also be interesting: https://github.com/clash-lang/clash-compiler/blob/master/clash-prelude/src/Clash/Annotations/BitRepresentation.hs

Jun 07 '19 14:06 martijnbastiaan

bitPattern maybe?

Jun 07 '19 14:06 gergoerdi

Perfect.

Jun 07 '19 14:06 martijnbastiaan

One option to actually address the core of this issue is to recognize when a pattern is "bit constant", and then replace it with a BitPat - which we'd need to add to Pat:

https://github.com/clash-lang/clash-compiler/blob/f4a5d1752e2ea5df2d104ce2e75c46f76db58f2a/clash-lib/src/Clash/Core/Term.hs#L65-L73

If all alternatives of a case are BitPats, we could even print a nice casex in the resulting HDL.

Jun 07 '19 20:06 martijnbastiaan

clash-compiler clash-compiler copied to clipboard

Bad performance when pattern-matching bit by bit

clash-compiler
clash-compiler copied to clipboard