bake
bake copied to clipboard
How to run tests in parallel with multiple clients?
I guess everything is in the question but let's elaborate a bit:
- Build process is a "complex" pipeline made of several stages, some of which can be run in parallel, e.g. integration tests, end-to-end tests, UI tests, some must be sequential
- AFAIK, bake clients will build everything when a patch is submitted: Having multiple clients is mostly intended to run the same tests in multiple configurations
- To implement that feature, some clients would need to wait for sequential parts to complete, so I guess the logic should reside in the server which will be responsible for dispatching tests to clients
I had a look at Development/Bake/Server/
code, more particularly at Brain
module but I must confess the code is a bit opaque for me. I would be interested in contributing that feature but not sure how/where to start.
The model for doing Windows and Linux tests is that you have multiple tests, typically parameterised (e.g. a pair of OS
and Test
), and some tests get run by one client, some by another. But having multiple clients should "just work", even if multiple clients can run a single test. In general, if a test T2
depends on a test T1
then all clients will separately do T1
- there is an assumption that the test T1
alters the local machine in some way that is required for T2
. But if T3
and T4
also depend on T1
, then both clients will do T1
, but then start doing different tests.
So generally speaking, throw multiple clients at it, and it should go faster - it's designed that way (and the multiple configurations thing is just a side-effect of that design, not the actual design itself).
OK. But this means I have to allocate clients to some tests explicitly in the description of the tests...
No requirement at all to allocated clients to tests - the Brains module does that all at all. The provide/require stuff is to limit decisions, if you leave it all out, any client can do anything.
Yes, but then if I don't set explicit provide/require all clients will do everything all the time, which is not what I want. I guess what I want may be a different model.
It won't aim for all clients to do everything - it aims to do everything once, but if a client is otherwise idle, it will do whatever it can, with the aim of completing everything as fast as it can. The clients will be busy all the time, but there will be no pointless duplication. If you can share some dependency information in your tasks, I could explain exactly what would happen with more clients.
Sure, here is the main bake file:
type ImageId = String
type HostPort = Int
data Action = Compile
| Dependencies
| RunDocker ImageId HostPort
| Deploy
| IntegrationTest
| UITest
| EndToEndTest
deriving (Show,Read)
instance Stringy Action where
stringyTo = show
stringyFrom = read
patchFile :: String
patchFile = ".bake.patch"
imageName :: String
imageName = "capitalmatch/app"
main :: IO ()
main = do
putStrLn "Running CI"
let err = "You need to set an environment variable named $REPO for the Git repo"
repo <- fromMaybe (error err) `fmap` lookupEnv "REPO"
bake $
ovenNotifySlack "URI" "Channel" $
ovenIncremental $
-- what's this for ? Why the '=' sign...? looks like this is added to patch submitted
ovenPretty $
ovenStorePatch $
ovenGitNotes repo $
ovenGit repo "master" Nothing $
ovenNotifyStdout $
ovenTest (return allTests) execute
defaultOven{ovenServer=("127.0.0.1",5000)}
-- | Extract latest patch from state or patches informaiont
-- this introduces a constraint for type of patch and state to be equals, which works well
-- for git but might fail for other systems... Anyway, we don't care for now.
ovenStorePatch :: (Stringy patch, state ~ patch) => Oven state patch test -> Oven state patch test
ovenStorePatch oven@Oven{..} = oven { ovenPrepare = \ s ps -> do
let latest = extractLatestPatch s ps
writeFile patchFile latest
ovenPrepare s ps
}
extractLatestPatch :: (Stringy s) => s -> [s] -> String
extractLatestPatch s [] = stringyTo s
extractLatestPatch _ ps = stringyTo $ last ps
-- |Add ''notes'' containing result of build and attach it to head patch
-- The idea is to leverage standard git's notes mechanism to attach meta information from
-- build to commits thus allowing anybody to retrieve information about commits' builds
-- and actions that affected it.
ovenGitNotes :: (Stringy patch, state ~ patch) => String -> Oven state patch test -> Oven state patch test
ovenGitNotes repo oven@Oven{..} = oven{ ovenUpdate = \ up ps -> do
s <- ovenUpdate up ps
pushNotes repo up ps
return s
}
pushNotes :: (Stringy s) => String -> s -> [s] -> IO ()
pushNotes repo s ps = do
let latest = extractLatestPatch s ps
() <- cmd "git fetch" [repo] ["refs/notes/*:refs/notes/*"]
() <- cmd "git notes --ref=bake append" ["-m", "build successful", latest]
() <- cmd "git push" [repo] ["refs/notes/*"]
return ()
allTests :: [Action]
allTests = [ Compile
, Dependencies
, IntegrationTest
, UITest
, EndToEndTest
, Deploy
, RunDocker "app" 8080
, RunDocker "app-dev" 8081
]
execute :: Action -> TestInfo Action
execute Dependencies = run $ do
opt <- addPath ["."] []
() <- cmd opt "./build.sh images/ghc-clojure.uuid"
sleep 1
incrementalDone
execute Compile = depend [Dependencies] $ run $ do
opt <- addPath ["."] []
() <- cmd opt "./build.sh"
sleep 1
incrementalDone
execute IntegrationTest = depend [Compile] $ run $ do
opt <- addPath ["."] []
() <- cmd opt "./build.sh test"
incrementalDone
execute UITest = depend [Compile] $ run $ do
opt <- addPath ["."] []
() <- cmd opt "./build.sh ui-test"
incrementalDone
execute EndToEndTest = depend [Compile] $ run $ do
opt <- addPath ["."] []
() <- cmd opt "./build.sh end-to-end-tests"
incrementalDone
execute Deploy = depend [IntegrationTest, UITest, EndToEndTest] $ run $ do
patch <- readFile patchFile
-- we make sure that tag is not set to another image
() <- cmd $ "docker tag -f " ++ latest imageName ++ " " ++ tagged imageName patch
() <- cmd $ "docker push " ++ tagged imageName patch
incrementalDone
where
tagged name patch = name ++ ":" ++ patch
latest name = tagged name "latest"
execute (RunDocker image port) = depend [UITest, IntegrationTest, EndToEndTest] $ run $ do
Exit _ <- cmd $ "docker stop " ++ image
Exit _ <- cmd $ "docker rm " ++ image
() <- cmd "docker run" [ "--name=" ++ image
, "-d"
, "-p", show port ++ ":8080"
, "-v", "/home/build/data-" ++ image ++ ":/data"
, "capitalmatch/" ++ image ++ ":latest"
]
incrementalDone
For info, cmd
now supports the AddPath
option, which means you can write cmd (AddPath ["."] []) ...
- which might be a bit cleaner.
So in your case, since you have a complete set of dependencies through the whole system, an extra client will hurt you and run all the tests twice. I think the problem is that we have depend
which requires that UITest
have been run on this client successfully before you continue. What you want is dependAnywhere
which requires the test to be finished, but not on this particular machine, since there is no side effect from UITest
, you just want to ensure it has passed. With that hypothetical and entirely reasonable addition, would you then see how things could work out? I suspect you would also want to tie the deployment to an individual client, so you don't accidentally deploy twice.
Thanks, that's also what I inferred: adding another client won't help us as-is. And your proposal definitely makes sense. The model I have in mind is that of a DAG of "tests" that is projected on all available clients instances, possibly in parallel if possible.
Yep, that's what we'd have if dependencies worked the way you wanted - I'll see what/when I can do that. Any indication how important it is for you?
Between "Nice to have" and "Should have"... I would be happy to help if you can point me at the right direction :-)
Brains is the beating heart and soul of Bake, it's also the most confusing bit. First thing is to plumb it through https://github.com/ndmitchell/bake/blob/master/src/Development/Bake/Core/Type.hs as an additional field. Once it reads https://github.com/ndmitchell/bake/blob/master/src/Development/Bake/Server/Brain.hs#L249-L307 you need to find the transitive closure of this new type of depends (just like I do for the existing depends), then add a filter in suitable
.
It might actually not be too hard, having looked at it - since the existing depends
is exactly the same for 90% of the places - but I wouldn't be shocked if it turned out to be deeper.
To elaborate: UI Tests, integration tests and ETE Tests are an integral part of the deployment process and take a significant share of the time it takes to build our system. Here are the stats from bake:
Compile 47 5m14s 4h06m 31m35s UITest 11 3m43s 40m57s 3m59s EndToEndTest 46 2m48s 2h08m 6m21s IntegrationTest 45 2m17s 1h43m 2m52s
Parallelizing execution of those tests would provide us on average a speed-up of 36% which is not negligible...
Hmm, computation actually wrong as I do not take into account downstream (sequential) steps, but you get the idea :-)