bake How to run tests in parallel with multiple clients?

I guess everything is in the question but let's elaborate a bit:

Build process is a "complex" pipeline made of several stages, some of which can be run in parallel, e.g. integration tests, end-to-end tests, UI tests, some must be sequential
AFAIK, bake clients will build everything when a patch is submitted: Having multiple clients is mostly intended to run the same tests in multiple configurations
To implement that feature, some clients would need to wait for sequential parts to complete, so I guess the logic should reside in the server which will be responsible for dispatching tests to clients

I had a look at Development/Bake/Server/ code, more particularly at Brain module but I must confess the code is a bit opaque for me. I would be interested in contributing that feature but not sure how/where to start.

Sep 27 '15 07:09 abailly

The model for doing Windows and Linux tests is that you have multiple tests, typically parameterised (e.g. a pair of OS and Test), and some tests get run by one client, some by another. But having multiple clients should "just work", even if multiple clients can run a single test. In general, if a test T2 depends on a test T1 then all clients will separately do T1 - there is an assumption that the test T1 alters the local machine in some way that is required for T2. But if T3 and T4 also depend on T1, then both clients will do T1, but then start doing different tests.

So generally speaking, throw multiple clients at it, and it should go faster - it's designed that way (and the multiple configurations thing is just a side-effect of that design, not the actual design itself).

Sep 27 '15 20:09 ndmitchell

OK. But this means I have to allocate clients to some tests explicitly in the description of the tests...

Sep 28 '15 07:09 abailly

No requirement at all to allocated clients to tests - the Brains module does that all at all. The provide/require stuff is to limit decisions, if you leave it all out, any client can do anything.

Sep 28 '15 07:09 ndmitchell

Yes, but then if I don't set explicit provide/require all clients will do everything all the time, which is not what I want. I guess what I want may be a different model.

Sep 28 '15 07:09 abailly

It won't aim for all clients to do everything - it aims to do everything once, but if a client is otherwise idle, it will do whatever it can, with the aim of completing everything as fast as it can. The clients will be busy all the time, but there will be no pointless duplication. If you can share some dependency information in your tasks, I could explain exactly what would happen with more clients.

Sep 28 '15 07:09 ndmitchell

Sure, here is the main bake file:

type ImageId = String
type HostPort = Int

data Action = Compile
            | Dependencies
            | RunDocker ImageId HostPort
            | Deploy
            | IntegrationTest
            | UITest
            | EndToEndTest
            deriving (Show,Read)

instance Stringy Action where
  stringyTo = show
  stringyFrom = read

patchFile :: String
patchFile = ".bake.patch"

imageName :: String
imageName = "capitalmatch/app"

main :: IO ()
main = do
    putStrLn "Running CI"
    let err = "You need to set an environment variable named $REPO for the Git repo"
    repo <- fromMaybe (error err) `fmap` lookupEnv "REPO"
    bake $
      ovenNotifySlack "URI" "Channel" $
      ovenIncremental $
      -- what's this for ? Why the '=' sign...? looks like this is added to patch submitted
      ovenPretty $
      ovenStorePatch $
      ovenGitNotes repo $
      ovenGit repo "master" Nothing $
      ovenNotifyStdout $
      ovenTest (return allTests) execute
      defaultOven{ovenServer=("127.0.0.1",5000)}

-- | Extract latest patch from state or patches informaiont
-- this introduces a constraint for type of patch and state to be equals, which works well
-- for git but might fail for other systems... Anyway, we don't care for now.
ovenStorePatch :: (Stringy patch, state ~ patch) => Oven state patch test  -> Oven state patch test
ovenStorePatch oven@Oven{..}  = oven { ovenPrepare = \ s ps -> do
                                          let latest = extractLatestPatch s ps
                                          writeFile patchFile latest
                                          ovenPrepare s ps
                                     }


extractLatestPatch :: (Stringy s) => s -> [s] -> String
extractLatestPatch s [] = stringyTo s
extractLatestPatch _ ps = stringyTo $ last ps

-- |Add ''notes'' containing result of build and attach it to head patch
-- The idea is to leverage standard git's notes mechanism to attach meta information from
-- build to commits thus allowing anybody to retrieve information about commits' builds
-- and actions that affected it.
ovenGitNotes :: (Stringy patch, state ~ patch) => String -> Oven state patch test -> Oven state patch test
ovenGitNotes repo oven@Oven{..} = oven{ ovenUpdate = \ up ps -> do
                                           s <- ovenUpdate up ps
                                           pushNotes repo up ps
                                           return s
                                        }

pushNotes :: (Stringy s) => String -> s -> [s] -> IO ()
pushNotes repo s ps = do
  let latest = extractLatestPatch s ps
  () <-  cmd "git fetch" [repo] ["refs/notes/*:refs/notes/*"]
  () <-  cmd "git notes --ref=bake append" ["-m", "build successful", latest]
  () <-  cmd "git push" [repo] ["refs/notes/*"]
  return ()


allTests :: [Action]
allTests = [ Compile
           , Dependencies
           , IntegrationTest
           , UITest
           , EndToEndTest
           , Deploy
           , RunDocker "app" 8080
           , RunDocker "app-dev" 8081
           ]

execute :: Action -> TestInfo Action
execute Dependencies = run $ do
  opt <- addPath ["."] []
  () <- cmd opt "./build.sh images/ghc-clojure.uuid"
  sleep 1
  incrementalDone

execute Compile = depend [Dependencies] $ run $ do
  opt <- addPath ["."] []
  () <- cmd opt "./build.sh"
  sleep 1
  incrementalDone

execute IntegrationTest = depend [Compile] $ run $ do
  opt <- addPath ["."] []
  () <- cmd opt "./build.sh test"
  incrementalDone

execute UITest = depend [Compile] $ run $ do
  opt <- addPath ["."] []
  () <- cmd opt "./build.sh ui-test"
  incrementalDone

execute EndToEndTest = depend [Compile] $ run $ do
  opt <- addPath ["."] []
  () <- cmd opt "./build.sh end-to-end-tests"
  incrementalDone

execute Deploy  = depend [IntegrationTest, UITest, EndToEndTest] $ run $ do
  patch <- readFile patchFile
  -- we make sure that tag is not set to another image
  () <- cmd $ "docker tag -f " ++ latest imageName ++ " " ++ tagged imageName patch
  () <- cmd $ "docker push " ++ tagged imageName patch
  incrementalDone
    where
      tagged name patch = name ++ ":" ++ patch
      latest name       = tagged name "latest"

execute (RunDocker image port) = depend [UITest, IntegrationTest, EndToEndTest] $ run $ do
  Exit _ <- cmd $ "docker stop " ++ image
  Exit _ <- cmd $ "docker rm " ++ image
  () <- cmd "docker run" [ "--name=" ++ image
                         , "-d"
                         , "-p",  show port ++ ":8080"
                         , "-v", "/home/build/data-" ++ image ++ ":/data"
                         , "capitalmatch/" ++ image ++ ":latest"
                         ]
  incrementalDone

Sep 28 '15 07:09 abailly

For info, cmd now supports the AddPath option, which means you can write cmd (AddPath ["."] []) ... - which might be a bit cleaner.

So in your case, since you have a complete set of dependencies through the whole system, an extra client will hurt you and run all the tests twice. I think the problem is that we have depend which requires that UITest have been run on this client successfully before you continue. What you want is dependAnywhere which requires the test to be finished, but not on this particular machine, since there is no side effect from UITest, you just want to ensure it has passed. With that hypothetical and entirely reasonable addition, would you then see how things could work out? I suspect you would also want to tie the deployment to an individual client, so you don't accidentally deploy twice.

Sep 28 '15 08:09 ndmitchell

Thanks, that's also what I inferred: adding another client won't help us as-is. And your proposal definitely makes sense. The model I have in mind is that of a DAG of "tests" that is projected on all available clients instances, possibly in parallel if possible.

Sep 28 '15 08:09 abailly

Yep, that's what we'd have if dependencies worked the way you wanted - I'll see what/when I can do that. Any indication how important it is for you?

Sep 28 '15 09:09 ndmitchell

Between "Nice to have" and "Should have"... I would be happy to help if you can point me at the right direction :-)

Sep 28 '15 09:09 abailly

Brains is the beating heart and soul of Bake, it's also the most confusing bit. First thing is to plumb it through https://github.com/ndmitchell/bake/blob/master/src/Development/Bake/Core/Type.hs as an additional field. Once it reads https://github.com/ndmitchell/bake/blob/master/src/Development/Bake/Server/Brain.hs#L249-L307 you need to find the transitive closure of this new type of depends (just like I do for the existing depends), then add a filter in suitable.

It might actually not be too hard, having looked at it - since the existing depends is exactly the same for 90% of the places - but I wouldn't be shocked if it turned out to be deeper.

Sep 28 '15 09:09 ndmitchell

To elaborate: UI Tests, integration tests and ETE Tests are an integral part of the deployment process and take a significant share of the time it takes to build our system. Here are the stats from bake:

Compile 47 5m14s 4h06m 31m35s UITest 11 3m43s 40m57s 3m59s EndToEndTest 46 2m48s 2h08m 6m21s IntegrationTest 45 2m17s 1h43m 2m52s

Parallelizing execution of those tests would provide us on average a speed-up of 36% which is not negligible...

Sep 28 '15 09:09 abailly

Hmm, computation actually wrong as I do not take into account downstream (sequential) steps, but you get the idea :-)

Sep 28 '15 09:09 abailly

bake bake copied to clipboard

How to run tests in parallel with multiple clients?

bake
bake copied to clipboard