clash-compiler
clash-compiler copied to clipboard
Bad performance when pattern-matching bit by bit
Please find a full standalone compilable code at the end of this post.
I am experiencing very bad scaling from the CLaSH synthesizer when it is compiling the instruction fetch/decode function for an Intel 8080. If I try to compile my code as-is, it just eats up all memory in a couple minutes. However, if I start commenting out most branches of fetchInstr
, then it can finish quite quickly; for example, with just the following branches remaining, synthesis takes a mere 15 seconds:
fetchInstr :: (Monad m) => m (Unsigned 8) -> m Instr
fetchInstr fetch = do
b1 <- fetch
let b1'@(_ :> _ :> d2@r :> d1@p :> d0 :> s2 :> s1 :> s0 :> Nil) = bitCoerce b1 :: Vec 8 Bit
d = decodeOp $ bitCoerce (d2, d1, d0)
s = decodeOp $ bitCoerce (s2, s1, s0)
rp = decodeRP r p
rppp = decodeRPPP r p
cond = decodeCond (d2 :> d1 :> d0 :> Nil)
case b1' of
0 :> 1 :> _ :> _ :> _ :> _ :> _ :> _ :> Nil -> return $ MOV d (Op s)
0 :> 0 :> _ :> _ :> _ :> 1 :> 1 :> 0 :> Nil -> MOV d <$> Imm <$> fetch
0 :> 0 :> _ :> _ :> 0 :> 0 :> 0 :> 1 :> Nil -> LXI rp <$> fetch16
0 :> 1 :> 1 :> 1 :> 0 :> 1 :> 1 :> 0 :> Nil -> return HLT
0 :> 0 :> 0 :> 0 :> 0 :> 0 :> 0 :> 0 :> Nil -> return NOP
_ -> return NOP -- "Unofficial" NOP
where
fetch16 = do
lo <- fetch
hi <- fetch
return $ bitCoerce (hi, lo)
But as I keep re-adding more, the synthesis speed goes down and memory usage goes up quickly. For an intermediate example, this takes 56 seconds to synthesize:
fetchInstr :: (Monad m) => m (Unsigned 8) -> m Instr
fetchInstr fetch = do
b1 <- fetch
let b1'@(_ :> _ :> d2@r :> d1@p :> d0 :> s2 :> s1 :> s0 :> Nil) = bitCoerce b1 :: Vec 8 Bit
d = decodeOp $ bitCoerce (d2, d1, d0)
s = decodeOp $ bitCoerce (s2, s1, s0)
rp = decodeRP r p
rppp = decodeRPPP r p
cond = decodeCond (d2 :> d1 :> d0 :> Nil)
case b1' of
0 :> 1 :> _ :> _ :> _ :> _ :> _ :> _ :> Nil -> return $ MOV d (Op s)
0 :> 0 :> _ :> _ :> _ :> 1 :> 1 :> 0 :> Nil -> MOV d <$> Imm <$> fetch
0 :> 0 :> _ :> _ :> 0 :> 0 :> 0 :> 1 :> Nil -> LXI rp <$> fetch16
0 :> 0 :> 1 :> 1 :> 1 :> 0 :> 1 :> 0 :> Nil -> LDA <$> fetch16
0 :> 0 :> 1 :> 1 :> 0 :> 0 :> 1 :> 0 :> Nil -> STA <$> fetch16
0 :> 0 :> 1 :> 0 :> 1 :> 0 :> 1 :> 0 :> Nil -> LHLD <$> fetch16
0 :> 0 :> 1 :> 0 :> 0 :> 0 :> 1 :> 0 :> Nil -> SHLD <$> fetch16
0 :> 0 :> _ :> _ :> 1 :> 0 :> 1 :> 0 :> Nil -> return $ LDAX rp
0 :> 0 :> _ :> _ :> 0 :> 0 :> 1 :> 0 :> Nil -> return $ STAX rp
1 :> 1 :> 1 :> 0 :> 1 :> 0 :> 1 :> 1 :> Nil -> return XCHG
1 :> 0 :> 0 :> 0 :> 0 :> _ :> _ :> _ :> Nil -> return $ ALU ADD (Op s)
1 :> 1 :> 0 :> 0 :> 0 :> 1 :> 1 :> 0 :> Nil -> ALU ADD <$> Imm <$> fetch
1 :> 0 :> 0 :> 0 :> 1 :> _ :> _ :> _ :> Nil -> return $ ALU ADC (Op s)
1 :> 1 :> 0 :> 0 :> 1 :> 1 :> 1 :> 0 :> Nil -> ALU ADC <$> Imm <$> fetch
0 :> 1 :> 1 :> 1 :> 0 :> 1 :> 1 :> 0 :> Nil -> return HLT
0 :> 0 :> 0 :> 0 :> 0 :> 0 :> 0 :> 0 :> Nil -> return NOP
_ -> return NOP -- "Unofficial" NOP
Full code with no external dependencies:
{-# LANGUAGE RecordWildCards, ApplicativeDo #-}
{-# LANGUAGE GeneralizedNewtypeDeriving, DerivingStrategies #-}
module Bug where
import Clash.Prelude hiding (lift)
import Control.Monad.Trans.Except
import Control.Monad.Reader
import Control.Monad.State
import Data.Maybe (fromMaybe)
import Data.Monoid
import Control.Monad.RWS
mealyState :: (HiddenClockReset domain gated synchronous, Undefined s)
=> (i -> State s o) -> s -> (Signal domain i -> Signal domain o)
mealyState f = mealy step
where
step s x = let (y, s') = runState (f x) s
in (s', y)
type NumRegs = 8
type Reg = Index NumRegs
data Failure
= Underrun
| Overrun
deriving Show
data Buffer n dat = Buffer
{ bufferContents :: Vec n dat
, bufferNext :: Index (1 + n)
}
deriving (Show, Generic, Undefined)
instance (KnownNat n, Default dat) => Default (Buffer n dat) where
def = Buffer (pure def) 0
remember :: (KnownNat n) => Buffer n dat -> dat -> Buffer n dat
remember Buffer{..} x = Buffer
{ bufferContents = replace bufferNext x bufferContents
, bufferNext = bufferNext + 1
}
newtype FetchM n dat m a = FetchM{ unFetchM :: ReaderT (Buffer n dat) (StateT (Index (1 + n)) (ExceptT Failure m)) a }
deriving newtype (Functor, Applicative, Monad)
runFetchM :: (Monad m, KnownNat n) => Buffer n dat -> FetchM n dat m a -> m (Either Failure a)
runFetchM buf act = runExceptT $ evalStateT (runReaderT (unFetchM act) buf) 0
fetch :: (Monad m, KnownNat n) => FetchM n dat m dat
fetch = do
Buffer{..} <- FetchM ask
idx <- FetchM get
when (idx == maxBound) overrun
when (idx >= bufferNext) underrun
FetchM $ put $ idx + 1
return $ bufferContents !! idx
where
overrun = FetchM . lift . lift . throwE $ Overrun
underrun = FetchM . lift . lift . throwE $ Underrun
rA, rFlags, rB, rC, rD, rE, rH, rL :: Reg
rA = 7
rFlags = 6
rB = 0
rC = 1
rD = 2
rE = 3
rH = 4
rL = 5
data RegPair = Regs Reg Reg | SP
deriving (Eq, Ord, Show, Generic, Undefined)
rBC, rDE, rHL :: RegPair
rBC = Regs rB rC
rDE = Regs rD rE
rHL = Regs rH rL
type Flag = Index 8
fS, fZ, fA, fP, fC :: Flag
fS = 7
fZ = 6
fA = 4
fP = 2
fC = 0
data Op
= Reg Reg
| AddrHL
deriving (Eq, Ord, Show, Generic, Undefined)
type Value = Unsigned 8
type Addr = Unsigned 16
type Port = Unsigned 8
data Src
= Op Op
| Imm Value
deriving (Eq, Ord, Show, Generic, Undefined)
data ALU = ADD | ADC | SUB | SBB | AND | OR | XOR | CMP
deriving (Eq, Ord, Show, Enum, Bounded, Generic, Undefined)
data Cond = Cond Flag Bool
deriving (Eq, Ord, Show, Generic, Undefined)
data Instr
= MOV Op Src
| LXI RegPair (Unsigned 16)
| LDA Addr
| STA Addr
| LHLD Addr
| SHLD Addr
| LDAX RegPair
| STAX RegPair
| XCHG
| ALU ALU Src
| INR Op
| DCR Op
| INX RegPair
| DCX RegPair
| DAD RegPair
| DAA
| RLC
| RRC
| RAL
| RAR
| CMA
| CMC
| STC
| JMP Addr
| JMPIf Cond Addr
| CALL Addr
| CALLIf Cond Addr
| RET
| RETIf Cond
| RST (Unsigned 3)
| PCHL
| PUSH RegPair
| POP RegPair
| XTHL
| SPHL
| IN Port
| OUT Port
| INT Bool
| HLT
| NOP
deriving (Eq, Ord, Show, Generic, Undefined)
fetchInstr :: (Monad m) => m (Unsigned 8) -> m Instr
fetchInstr fetch = do
b1 <- fetch
let b1'@(_ :> _ :> d2@r :> d1@p :> d0 :> s2 :> s1 :> s0 :> Nil) = bitCoerce b1 :: Vec 8 Bit
d = decodeOp $ bitCoerce (d2, d1, d0)
s = decodeOp $ bitCoerce (s2, s1, s0)
rp = decodeRP r p
rppp = decodeRPPP r p
cond = decodeCond (d2 :> d1 :> d0 :> Nil)
case b1' of
0 :> 1 :> _ :> _ :> _ :> _ :> _ :> _ :> Nil -> return $ MOV d (Op s)
0 :> 0 :> _ :> _ :> _ :> 1 :> 1 :> 0 :> Nil -> MOV d <$> Imm <$> fetch
0 :> 0 :> _ :> _ :> 0 :> 0 :> 0 :> 1 :> Nil -> LXI rp <$> fetch16
0 :> 0 :> 1 :> 1 :> 1 :> 0 :> 1 :> 0 :> Nil -> LDA <$> fetch16
0 :> 0 :> 1 :> 1 :> 0 :> 0 :> 1 :> 0 :> Nil -> STA <$> fetch16
0 :> 0 :> 1 :> 0 :> 1 :> 0 :> 1 :> 0 :> Nil -> LHLD <$> fetch16
0 :> 0 :> 1 :> 0 :> 0 :> 0 :> 1 :> 0 :> Nil -> SHLD <$> fetch16
0 :> 0 :> _ :> _ :> 1 :> 0 :> 1 :> 0 :> Nil -> return $ LDAX rp
0 :> 0 :> _ :> _ :> 0 :> 0 :> 1 :> 0 :> Nil -> return $ STAX rp
1 :> 1 :> 1 :> 0 :> 1 :> 0 :> 1 :> 1 :> Nil -> return XCHG
1 :> 0 :> 0 :> 0 :> 0 :> _ :> _ :> _ :> Nil -> return $ ALU ADD (Op s)
1 :> 1 :> 0 :> 0 :> 0 :> 1 :> 1 :> 0 :> Nil -> ALU ADD <$> Imm <$> fetch
1 :> 0 :> 0 :> 0 :> 1 :> _ :> _ :> _ :> Nil -> return $ ALU ADC (Op s)
1 :> 1 :> 0 :> 0 :> 1 :> 1 :> 1 :> 0 :> Nil -> ALU ADC <$> Imm <$> fetch
1 :> 0 :> 0 :> 1 :> 0 :> _ :> _ :> _ :> Nil -> return $ ALU SUB (Op s)
1 :> 1 :> 0 :> 1 :> 0 :> 1 :> 1 :> 0 :> Nil -> ALU SUB <$> Imm <$> fetch
1 :> 0 :> 0 :> 1 :> 1 :> _ :> _ :> _ :> Nil -> return $ ALU SBB (Op s)
1 :> 1 :> 0 :> 1 :> 1 :> 1 :> 1 :> 0 :> Nil -> ALU SBB <$> Imm <$> fetch
1 :> 0 :> 1 :> 0 :> 0 :> _ :> _ :> _ :> Nil -> return $ ALU AND (Op s)
1 :> 1 :> 1 :> 0 :> 0 :> 1 :> 1 :> 0 :> Nil -> ALU AND <$> Imm <$> fetch
1 :> 0 :> 1 :> 1 :> 0 :> _ :> _ :> _ :> Nil -> return $ ALU OR (Op s)
1 :> 1 :> 1 :> 1 :> 0 :> 1 :> 1 :> 0 :> Nil -> ALU OR <$> Imm <$> fetch
1 :> 0 :> 1 :> 0 :> 1 :> _ :> _ :> _ :> Nil -> return $ ALU XOR (Op s)
1 :> 1 :> 1 :> 0 :> 1 :> 1 :> 1 :> 0 :> Nil -> ALU XOR <$> Imm <$> fetch
1 :> 0 :> 1 :> 1 :> 1 :> _ :> _ :> _ :> Nil -> return $ ALU CMP (Op s)
1 :> 1 :> 1 :> 1 :> 1 :> 1 :> 1 :> 0 :> Nil -> ALU CMP <$> Imm <$> fetch
0 :> 0 :> _ :> _ :> _ :> 1 :> 0 :> 0 :> Nil -> return $ INR d
0 :> 0 :> _ :> _ :> _ :> 1 :> 0 :> 1 :> Nil -> return $ DCR d
0 :> 0 :> _ :> _ :> 0 :> 0 :> 1 :> 1 :> Nil -> return $ INX rp
0 :> 0 :> _ :> _ :> 1 :> 0 :> 1 :> 1 :> Nil -> return $ DCX rp
0 :> 0 :> _ :> _ :> 1 :> 0 :> 0 :> 1 :> Nil -> return $ DAD rp
0 :> 0 :> 1 :> 0 :> 0 :> 1 :> 1 :> 1 :> Nil -> return DAA
0 :> 0 :> 0 :> 0 :> 0 :> 1 :> 1 :> 1 :> Nil -> return RLC
0 :> 0 :> 0 :> 0 :> 1 :> 1 :> 1 :> 1 :> Nil -> return RRC
0 :> 0 :> 0 :> 1 :> 0 :> 1 :> 1 :> 1 :> Nil -> return RAL
0 :> 0 :> 0 :> 1 :> 1 :> 1 :> 1 :> 1 :> Nil -> return RAR
0 :> 0 :> 1 :> 0 :> 1 :> 1 :> 1 :> 1 :> Nil -> return CMA
0 :> 0 :> 1 :> 1 :> 1 :> 1 :> 1 :> 1 :> Nil -> return CMC
0 :> 0 :> 1 :> 1 :> 0 :> 1 :> 1 :> 1 :> Nil -> return STC
1 :> 1 :> 0 :> 0 :> 0 :> 0 :> 1 :> 1 :> Nil -> JMP <$> fetch16
1 :> 1 :> _ :> _ :> _ :> 0 :> 1 :> 0 :> Nil -> JMPIf cond <$> fetch16
1 :> 1 :> 0 :> 0 :> 1 :> 1 :> 0 :> 1 :> Nil -> CALL <$> fetch16
1 :> 1 :> _ :> _ :> _ :> 1 :> 0 :> 0 :> Nil -> CALLIf cond <$> fetch16
1 :> 1 :> 0 :> 0 :> 1 :> 0 :> 0 :> 1 :> Nil -> return RET
1 :> 1 :> _ :> _ :> _ :> 0 :> 0 :> 0 :> Nil -> return $ RETIf cond
1 :> 1 :> 0 :> 1 :> 1 :> 0 :> 1 :> 1 :> Nil -> IN <$> fetch
1 :> 1 :> 0 :> 1 :> 0 :> 0 :> 1 :> 1 :> Nil -> OUT <$> fetch
1 :> 1 :> 1 :> 0 :> 1 :> 0 :> 0 :> 1 :> Nil -> return PCHL
1 :> 1 :> _ :> _ :> 0 :> 1 :> 0 :> 1 :> Nil -> return $ PUSH rppp
1 :> 1 :> _ :> _ :> 0 :> 0 :> 0 :> 1 :> Nil -> return $ POP rppp
1 :> 1 :> 1 :> 0 :> 0 :> 0 :> 1 :> 1 :> Nil -> return XTHL
1 :> 1 :> 1 :> 1 :> 1 :> 0 :> 0 :> 1 :> Nil -> return SPHL
1 :> 1 :> 1 :> 1 :> b :> 0 :> 1 :> 1 :> Nil -> return $ INT $ bitToBool b
1 :> 1 :> _ :> _ :> _ :> 1 :> 1 :> 1 :> Nil -> return $ RST $ bitCoerce $ d2 :> d1 :> d0 :> Nil
0 :> 1 :> 1 :> 1 :> 0 :> 1 :> 1 :> 0 :> Nil -> return HLT
0 :> 0 :> 0 :> 0 :> 0 :> 0 :> 0 :> 0 :> Nil -> return NOP
_ -> return NOP -- "Unofficial" NOP
where
fetch16 = do
lo <- fetch
hi <- fetch
return $ bitCoerce (hi, lo)
newtype CPU i s o a = CPU{ unCPU :: ExceptT () (RWS i (Last o) s) a }
deriving newtype (Functor, Applicative, Monad, MonadState s)
input :: CPU i s o i
input = CPU ask
output :: o -> CPU i s o ()
output = CPU . tell . pure
abort :: CPU i s o a
abort = CPU $ throwE ()
runCPU :: (s -> o) -> CPU i s o () -> (i -> State s o)
runCPU mkDef cpu inp = do
s <- get
let (s', writes) = execRWS (runExceptT $ unCPU cpu) inp s
put s'
def <- gets mkDef
return $ fromMaybe def $ getLast writes
runCPUDebug :: (s -> o) -> CPU i s o () -> (i -> State s (s, o))
runCPUDebug mkDef cpu inp = do
s0 <- get
out <- runCPU mkDef cpu inp
return (s0, out)
data Phase
= Init
| Fetching Bool (Buffer 3 Value)
deriving (Show, Generic, Undefined)
data CPUIn = CPUIn
{ cpuInMem :: Value
}
deriving (Show)
data CPUState = CPUState
{ phase, prevPhase :: Phase
, pc :: Addr
, instrBuf :: Instr
}
deriving (Show, Generic, Undefined)
initState :: CPUState
initState = CPUState
{ phase = Init
, prevPhase = Init
, pc = 0x0000
, instrBuf = NOP
}
defaultOut :: CPUState -> Addr
defaultOut CPUState{..} = pc
type M = CPU CPUIn CPUState Addr
cpu :: M ()
cpu = do
CPUIn{..} <- input
CPUState{..} <- get
-- trace (printf "%04x: %s" (fromIntegral pc :: Word16) (show phase)) $ return ()
case phase of
Init -> goto $ Fetching False def
Fetching interrupting buf -> do
buf' <- remember buf <$> do
unless interrupting $ setPC $ pc + 1
return cpuInMem
instr_ <- runFetchM buf' $ fetchInstr fetch
instr <- case instr_ of
Left Underrun -> goto (Fetching interrupting buf') >> abort
Left Overrun -> error "Overrun"
Right instr -> return instr
modify $ \s -> s{ instrBuf = instr }
goto $ Fetching False def
-- trace (printf "%04x: %s" (fromIntegral pc :: Word16) (show instr)) $ return ()
exec instr
where
exec NOP = return ()
exec instr = error $ show instr
goto :: Phase -> M ()
goto ph = modify $ \s -> s{ prevPhase = phase s, phase = ph }
getPC :: M Addr
getPC = gets pc
setPC :: Addr -> M ()
setPC pc = modify $ \s -> s{ pc = pc }
decodeOp :: Reg -> Op
decodeOp 6 = AddrHL
decodeOp reg = Reg reg
decodeCond :: Vec 3 Bit -> Cond
decodeCond cond = Cond flag b
where
(flag0, b) = unpack (pack cond) :: (Unsigned 2, Bool)
flag = case flag0 of
0b00 -> fZ
0b01 -> fC
0b10 -> fP
0b11 -> fS
decodeRP :: Bit -> Bit -> RegPair
decodeRP 0 0 = Regs rB rC
decodeRP 0 1 = Regs rD rE
decodeRP 1 0 = Regs rH rL
decodeRP 1 1 = SP
decodeRPPP :: Bit -> Bit -> RegPair
decodeRPPP 1 1 = Regs rA rFlags
decodeRPPP r p = decodeRP r p
{-# NOINLINE topEntity #-}
{-# ANN topEntity
(Synthesize
{ t_name = "SpaceInvaders"
, t_inputs =
[ PortName "CLK_25MHZ"
, PortName "RESET"
]
, t_output = PortName "ADDR"
}) #-}
topEntity
:: Clock System Source
-> Reset System Asynchronous
-> Signal System Addr
topEntity = exposeClockReset mainBoard
mainBoard
:: (HiddenClockReset domain gated synchronous)
=> Signal domain Addr
mainBoard = cpuOut
where
cpuOut = mealyState (runCPU defaultOut cpu) initState cpuIn
progROM addr = unpack <$> romFile (SNat @13) "image.hex" (truncateB <$> addr)
memRead = progROM
read = memRead cpuOut
cpuIn = do
cpuInMem <- read
pure CPUIn{..}
As an additional data point, keeping the following results in synthesis time of 5m34s, memory peaking somewhere around 8G:
fetchInstr :: (Monad m) => m (Unsigned 8) -> m Instr
fetchInstr fetch = do
b1 <- fetch
let b1'@(_ :> _ :> d2@r :> d1@p :> d0 :> s2 :> s1 :> s0 :> Nil) = bitCoerce b1 :: Vec 8 Bit
d = decodeOp $ bitCoerce (d2, d1, d0)
s = decodeOp $ bitCoerce (s2, s1, s0)
rp = decodeRP r p
rppp = decodeRPPP r p
cond = decodeCond (d2 :> d1 :> d0 :> Nil)
case b1' of
0 :> 1 :> _ :> _ :> _ :> _ :> _ :> _ :> Nil -> return $ MOV d (Op s)
0 :> 0 :> _ :> _ :> _ :> 1 :> 1 :> 0 :> Nil -> MOV d <$> Imm <$> fetch
0 :> 0 :> _ :> _ :> 0 :> 0 :> 0 :> 1 :> Nil -> LXI rp <$> fetch16
0 :> 0 :> 1 :> 1 :> 1 :> 0 :> 1 :> 0 :> Nil -> LDA <$> fetch16
0 :> 0 :> 1 :> 1 :> 0 :> 0 :> 1 :> 0 :> Nil -> STA <$> fetch16
0 :> 0 :> 1 :> 0 :> 1 :> 0 :> 1 :> 0 :> Nil -> LHLD <$> fetch16
0 :> 0 :> 1 :> 0 :> 0 :> 0 :> 1 :> 0 :> Nil -> SHLD <$> fetch16
0 :> 0 :> _ :> _ :> 1 :> 0 :> 1 :> 0 :> Nil -> return $ LDAX rp
0 :> 0 :> _ :> _ :> 0 :> 0 :> 1 :> 0 :> Nil -> return $ STAX rp
1 :> 1 :> 1 :> 0 :> 1 :> 0 :> 1 :> 1 :> Nil -> return XCHG
1 :> 0 :> 0 :> 0 :> 0 :> _ :> _ :> _ :> Nil -> return $ ALU ADD (Op s)
1 :> 1 :> 0 :> 0 :> 0 :> 1 :> 1 :> 0 :> Nil -> ALU ADD <$> Imm <$> fetch
1 :> 0 :> 0 :> 0 :> 1 :> _ :> _ :> _ :> Nil -> return $ ALU ADC (Op s)
1 :> 1 :> 0 :> 0 :> 1 :> 1 :> 1 :> 0 :> Nil -> ALU ADC <$> Imm <$> fetch
1 :> 0 :> 0 :> 1 :> 0 :> _ :> _ :> _ :> Nil -> return $ ALU SUB (Op s)
1 :> 1 :> 0 :> 1 :> 0 :> 1 :> 1 :> 0 :> Nil -> ALU SUB <$> Imm <$> fetch
1 :> 0 :> 0 :> 1 :> 1 :> _ :> _ :> _ :> Nil -> return $ ALU SBB (Op s)
1 :> 1 :> 0 :> 1 :> 1 :> 1 :> 1 :> 0 :> Nil -> ALU SBB <$> Imm <$> fetch
1 :> 0 :> 1 :> 0 :> 0 :> _ :> _ :> _ :> Nil -> return $ ALU AND (Op s)
1 :> 1 :> 1 :> 0 :> 0 :> 1 :> 1 :> 0 :> Nil -> ALU AND <$> Imm <$> fetch
1 :> 0 :> 1 :> 1 :> 0 :> _ :> _ :> _ :> Nil -> return $ ALU OR (Op s)
1 :> 1 :> 1 :> 1 :> 0 :> 1 :> 1 :> 0 :> Nil -> ALU OR <$> Imm <$> fetch
1 :> 0 :> 1 :> 0 :> 1 :> _ :> _ :> _ :> Nil -> return $ ALU XOR (Op s)
1 :> 1 :> 1 :> 0 :> 1 :> 1 :> 1 :> 0 :> Nil -> ALU XOR <$> Imm <$> fetch
1 :> 0 :> 1 :> 1 :> 1 :> _ :> _ :> _ :> Nil -> return $ ALU CMP (Op s)
1 :> 1 :> 1 :> 1 :> 1 :> 1 :> 1 :> 0 :> Nil -> ALU CMP <$> Imm <$> fetch
0 :> 0 :> _ :> _ :> _ :> 1 :> 0 :> 0 :> Nil -> return $ INR d
0 :> 0 :> _ :> _ :> _ :> 1 :> 0 :> 1 :> Nil -> return $ DCR d
0 :> 0 :> _ :> _ :> 0 :> 0 :> 1 :> 1 :> Nil -> return $ INX rp
0 :> 0 :> _ :> _ :> 1 :> 0 :> 1 :> 1 :> Nil -> return $ DCX rp
0 :> 0 :> _ :> _ :> 1 :> 0 :> 0 :> 1 :> Nil -> return $ DAD rp
0 :> 0 :> 1 :> 0 :> 0 :> 1 :> 1 :> 1 :> Nil -> return DAA
0 :> 1 :> 1 :> 1 :> 0 :> 1 :> 1 :> 0 :> Nil -> return HLT
0 :> 0 :> 0 :> 0 :> 0 :> 0 :> 0 :> 0 :> Nil -> return NOP
_ -> return NOP -- "Unofficial" NOP
I'm guessing this is due to every :>
representing a different (GADT) case. I don't think it'll be trivial to speed this case up. (I won't be able to look at in detail til at least after the weekend, though, so I'm not sure.) Could you try:
{-# LANGUAGE ViewPatterns #-}
[..]
case b1 of
-- mask value
((.&.) 0b11000000 -> 0b01000000) -> return $ MOV d (Op s)
((.&.) 0b11000111 -> 0b01000110) -> MOV d <$> Imm <$> fetch
[..]
This should be much quicker in simulation, as well as Clash compilation. If that works, I guess you could write a template haskell function as a workaround..
case b1 of
-- mask value
$(casex "01xxxxxx") -> return $ MOV d (Op s)
$(casex "01xxx110") -> MOV d <$> Imm <$> fetch
[..]
I'm more than happy to write it for you if the first example actually solves your issues.
So does that mean it should also work to deconstruct it into a (Bit, Bit, Bit, Bit, Bit, Bit, Bit, Bit)
and pattern-match on that? I'd prefer that to manual bitmasking.
So a quick experiment shows the 8-tuple of Bit
s has the same problem as the Vec 8 Bit
, even though it's not a GADT. I guess because it is still compiled to nested pattern matches coordinate-by-coordinate.
Your TH idea looks like a pragmatic compromise, so I am going to go with that for now. For reference, my implementation is:
import Clash.Prelude
import Language.Haskell.TH
import Data.List as L
import Data.Maybe (fromMaybe)
fromBits :: [Bit] -> Integer
fromBits = L.foldl (\v b -> v `shiftL` 1 .|. fromIntegral b) 0
bitMask :: String -> Q Pat
bitMask s = [p| (($mask .&.) -> $pattern) |]
where
bs = parse <$> s
mask = litE . IntegerL . fromBits $ maybe 0 (const 1) <$> bs
pattern = litP . IntegerL . fromBits $ fromMaybe 0 <$> bs
parse 'x' = Nothing
parse '_' = Nothing
parse '0' = Just 0
parse '1' = Just 1
Thanks for posting the snippet @gergoerdi. Embarrassingly, I didn't know you could use Q Pat
, so I was hacking on an uglier approach. Could you confirm the workaround works? I'll suggest it for inclusion in the Prelude if it does.
For my stripped-down test case from this ticket, it seems the bitmask view patterns do indeed work:
GHC+Clash: Loading modules cumulatively took 13.849s
Clash: Parsing and compiling primitives took 0.623s
Clash: Compiling Bug.topEntity
Clash: Applied 1762 transformations
Clash: Normalisation took 11.658s
Clash: Netlist generation took 0.292s
WARNING: file "image.hex" does not exist
Clash: Total compilation took 26.042s
Unfortunately, my original code is still synthesizing: it's been running for 10 minutes now, but its memory usage is steady at just below 4 GB so I'll keep it running for a bit.
I'm happy to prepare a pull request for clash-prelude
with the bitMask
macro; would Clash.Prelude.TH.BitMask
work for the name? Or Clash.Prelude.BitMask.TH
? Or just Clash.Prelude.BitMask
?
GHC+Clash: Loading modules cumulatively took 24.404s
Clash: Parsing and compiling primitives took 0.647s
Clash: Compiling SpaceInvaders.topEntity
Clash: Applied 74909 transformations
Clash: Normalisation took 15m40.465s
Clash: Netlist generation took 5.340s
Clash: Total compilation took 16m15.785s
Build completed in 16m16s
I'd be fine with having it in Clash.Sized.Internal.BitVector
and exporting it in Clash.Sized.BitVector
, Clash.Prelude
, and Clash.Explicit.Prelude
. To keep it consistent with bLit
, we'd need to accept .
instead of _
/x
. I'm wondering if there's a more obvious name than bitMask
..
This module might also be interesting: https://github.com/clash-lang/clash-compiler/blob/master/clash-prelude/src/Clash/Annotations/BitRepresentation.hs
bitPattern
maybe?
Perfect.
One option to actually address the core of this issue is to recognize when a pattern is "bit constant", and then replace it with a BitPat
- which we'd need to add to Pat
:
https://github.com/clash-lang/clash-compiler/blob/f4a5d1752e2ea5df2d104ce2e75c46f76db58f2a/clash-lib/src/Clash/Core/Term.hs#L65-L73
If all alternatives of a case are BitPat
s, we could even print a nice casex
in the resulting HDL.