Cilicon icon indicating copy to clipboard operation
Cilicon copied to clipboard

Is it possible to start multiple instances of VMs at the same time?

Open antranapp opened this issue 2 years ago • 22 comments

We have a bunch of Mac Studios for CI in our office at this moment but we can use it only for run a single job at a time right now due to flaky Xcode with parallel executions.

Can Cilicon start multiple VMs at the same time, which run independent from each others? This'd help us to ultilise the Mac Studios' resources better by parallelising multiple jobs in multiple VMS

antranapp avatar Feb 06 '23 02:02 antranapp

Hi @antranapp, that's definitely something we're considering to implement, however I can't give a clear timeline. In theory it should be possible to run up to 2 VMs in parallel. While I haven't tried it myself I've heard from several people that 2 VMs is a limitation of the Virtualization framework.

Marcocanc avatar Feb 06 '23 09:02 Marcocanc

While it may not be intentional, this works already by starting a second instance of Cilicon. It would be awesome if a single instance were able to manage multiple vms with different configurations to support building on different macos versions.

AaronBurchfield avatar Mar 15 '23 07:03 AaronBurchfield

I've heard from several people that 2 VMs is a limitation of the Virtualization framework

It's not a limitation of the framework or the hardware, but rather the legal agreement. You can find it here and the important part is under 2.b.iii:

"to run up to two (2) additional copies or instances of the Apple Software within virtual operating system environments on each Apple-branded computer" (shortened for brevity)

You will get a VZErrorDomain error 6 if you try to add 3 or more which tells you it's not allowed. This is a very old part of the agreement though, so hopefully if enough people ask Apple to amend it they may open it up on Apple Silicon to allow more since clearly with how powerful the new chips are it can definitely support it.


With that said, I'm adding +1 to the ticket as I'd love to see support for two added for the reason above of wanting to support multiple CI test jobs in parallel without having them impact one another.

Sherlouk avatar Mar 19 '23 10:03 Sherlouk

Any updates?

ivan-gaydamakin avatar May 02 '23 12:05 ivan-gaydamakin

This would be very interesting for us. What's needed in terms of implementation to support this feature?

ast3150 avatar Aug 18 '23 12:08 ast3150

Fyi opening Cilicon multiple times (using open -n -a Cilicon) results in the network connection being dropped for both instances of Cilicon

ast3150 avatar Aug 18 '23 12:08 ast3150

Hiya @ast3150 and @Marcocanc , I am running Cilicon right now with two VMs and it seems to be working very well (Cilicon is great full-stop, thank you @Marcocanc for publishing and supporting this project – it really hits the right balance for me between believably ephemeral instances and low-fussiness systems complexity, given a moderate need). I am doing multiple VMs via a pretty easy workaround, but it is my hope to find some discretionary time so I can submit a real PR for this feature (and possibly a couple of other itch-scratches I'm doing locally). For now I will just document the workaround, which is having two running app instances of Cilicon where the second one looks for a config entitled cilicon2.yml (this requires building your own Cilicon.app for the second app/instance).

  1. Open Cilicon.xcodeproj and change every reference to the build target Cilicon in the project and its build settings to Cilicon2 so you can export an archive build to /Applications to run alongside Cilicon.app. I also changed the PRODUCT_BUNDLE_IDENTIFIER and used my own codesigning in order to separate other potentially shared resources I don't know about that use the bundle domain. It is entirely possible that this is unnecessary and it's enough to simply rename the built app, but I decided to do it this way in order to avoid wondering about edge cases when I had other mysteries to debug.

  2. Change the ConfigManager.swift line static let configPaths = ["/cilicon.yml", "/.cilicon.yml"] to static let configPaths = ["/cilicon2.yml", "/.cilicon2.yml"].

  3. Build app, put Cilicon2.app product wherever you put Cilicon.app.

That's actually it, then you can launch both apps and at least for me, they are both getting network and can both launch different correctly-configured VMs that appear to work correctly so far.

Halle avatar Oct 18 '23 09:10 Halle

I'm guessing that the design of this inside a single app could be something as low-key as "if a cilicon2.yml exists, open a second window, do all the stuff" and you could probably leave it to the enduser to deal with the problem of providing a good cilicon2.yml, at least to start.

Halle avatar Oct 18 '23 09:10 Halle

Hi Halle, I will start working on a (much-needed) refactor of Cilicon with a much cleaner architecture and support for multiple VMs soon. Let me know if you'd like to contribute and we can try find a way to collaborate on it.

Marcocanc avatar Oct 18 '23 14:10 Marcocanc

I'd enjoy that!

Halle avatar Oct 24 '23 11:10 Halle

BTW, there is one other change needed in code to support this currently in the form I described above. If you follow my instructions above, you will eventually encounter this issue where a complete run leads to a permanent shutdown of one or the other instance. I believe the reason for this is that in the current logic, there is a brief period in which there would be three instances of virtualized macOS, which isn't allowed. I changed this in setupAndRunVirtualMachine():

        Task { @MainActor in
            vmState = .running(virtualMachine)
            try await virtualMachine.start()
        }

to first check for whether there is a VM in the .running state in the instance, and if so, to first stop it before starting it, and also to set vmState to the .running state after virtualMachine.start() is done waiting instead of before. This results in a slow restart after there has been a runner run (maybe there is a race condition?), but it is successful.

Halle avatar Oct 25 '23 11:10 Halle

Hi dear Colleagues,

I really like CIlicon so I wanted to share my current state of running multiple instances:

I was also able to run multiple instances of Cilicon. The only difference and more convenient way i used to run Cilicon is by adding a launch argument and start each instant in the following way:

open /Applications/Cilicon.app -n --args -config-path /Users//cilicon.yml

Running this will start two Cilicon instances using the provided configs: open /Applications/Cilicon.app -n --args -config-path /Users/user1/cilicon.yml open /Applications/Cilicon.app -n --args -config-path /Users/user1/cilicon2.yml

@main
struct CiliconApp: App {
    /// If no launch argument is found. The default fallback config is used
    static let fallbackConfig = "\(NSHomeDirectory())/cilicon.yml"

    /// Launch argument to config e.g. ~/cilicon.yml
    static let configPath = UserDefaults.standard.string(forKey: "config-path") ?? fallbackConfig

   ...

}

FabianBartels avatar Aug 22 '24 10:08 FabianBartels

Hi @FabianBartels ,

Thanks for sharing and this is an interesting approach. Have you encounter the issue described by @Halle above, at some point Cilicon trying to have 3 VMs started, which is not allowed?

Cilicon instance 1 -> running 1 VM and waiting for workflows to be picked up Cilicon instance 2 -> running 1 VM, but when restarting/recreating it, 2 VMs will attempt to run at the same time. I suspect one is the one to close and another one that's new. During this process, Cilicon instance 2 would fail.

If yes, what was the fix for that issue?

ccorneliu avatar Aug 22 '24 12:08 ccorneliu

HI @ccorneliu,

i checked the fork of @Halle and the only thing that was missing compared to the current main branch of the official Cilicon was a sleep.

So far what works for me is:

  1. run "open /Applications/Cilicon.app -n --args -config-path /Users/user1/cilicon.yml"
  2. (Optional) wait till its successfully connected to github
  3. run "open /Applications/Cilicon.app -n --args -config-path /Users/user1/cilicon2.yml"

I ran multiple tests and all of them worked without any issues.

See sleep in this excerpt:

    logger.log(string: "---------- Starting Up ----------\n")
        try await virtualMachine.start()
        vmState = .running(virtualMachine)
        
        try await Task.sleep(for: .seconds(5))

Full method:

 @MainActor
    private func setupAndRunVirtualMachine() async throws {
        try await cloneBundle()
        let vmHelper = VMConfigHelper(vmBundle: activeBundle)
        let vmConfig = try vmHelper.computeRunConfiguration(config: config)
        let virtualMachine = VZVirtualMachine(configuration: vmConfig)
        virtualMachine.delegate = self

        logger.log(string: "---------- Starting Up ----------\n")
        try await virtualMachine.start()
        vmState = .running(virtualMachine)
        
        try await Task.sleep(for: .seconds(5))

        self.ip = try await fetchIP(macAddress: clonedBundle.configuration.macAddress.string)
        
        let client = try await SSHClient.connect(
            host: ip,
            authenticationMethod: .passwordBased(username: config.sshCredentials.username, password: config.sshCredentials.password),
            hostKeyValidator: .acceptAnything(),
            reconnect: .always
        )

        if let preRun = config.preRun {
            let streamOutput = try await client.executeCommandStream(preRun, inShell: true)
            for try await blob in streamOutput {
                switch blob {
                case let .stdout(stdout):
                    logger.log(string: String(buffer: stdout))
                case let .stderr(stderr):
                    logger.log(string: String(buffer: stderr))
                }
            }
        }

        if let provisioner {
            do {
                try await provisioner.provision(bundle: activeBundle, sshClient: client)
            } catch {
                logger.log(string: error.localizedDescription + "\n")
            }
        }

        if let postRun = config.postRun {
            let streamOutput = try await client.executeCommandStream(postRun, inShell: true)
            for try await blob in streamOutput {
                switch blob {
                case let .stdout(stdout):
                    logger.log(string: String(buffer: stdout))
                case let .stderr(stderr):
                    logger.log(string: String(buffer: stderr))
                }
            }
        }
        try await client.close()
        logger.log(string: "---------- Shutting Down ----------\n")
        Task { @MainActor in
            try await virtualMachine.stop()
            try await handleStop()
        }
    }




In addition to that (not related to the issue you asked for) is a change of the GitHub Provisioner:

  • This makes the ephemeral runner unique and was missing in my opinion.
  • Also it solves an auth issue with not being able to configure the runner when the Cilicon app is restarted because the token is lost at that point.
  • Be aware that for each start of Cilicon you create a new ephemeral runner in github. Github nukes ephemeral on a daily bases if not used, so its not a big issue :) just wanted to mention that.
    func provision(bundle: VMBundle, sshClient: SSHClient) async throws {
         ...
        let runnerName = self.runnerName + "-" + UUID().uuidString.lowercased()
       ...

FabianBartels avatar Aug 22 '24 12:08 FabianBartels

Hey all, sorry, I missed the activity on this issue. I've been (on-and-off) working on a new version of Cilicon that supports multiple VMs.

Screenshot 2024-08-26 at 10 19 30

It definitely still needs quite some work. Specifically :

  • The download logic is still not working well. The idea is to have a download manager that will queue OCI downloads. If both VMs want the same image, they should both wait for the same download to finish. If one Runner has the image it needs, it should run while the other is getting its image downloaded.
  • There's a pesky bug that has been haunting the current versions of Cilicon, but is more noticeable when running two VMs at the same time. Sometimes when stopping the VM, try await vm.stop() gets stuck. No Error, no success. I'm leaning towards it being an internal issue in Virtualization.framework, but maybe I'm doing something wrong.
  • We've been testing this version on one of our machines (although only with a single VM), and on rare occasions it crashes with a SwiftUI bug. Cilicon is my only SwiftUI experience. If anyone who's experienced with SwiftUI spots any red flags, please do point them out.
  • Restart Scheduling not fully implemented yet

Happy to accept contributions on the branch if anyone wants to contribute. Will publish the branch and a bleeding edge build in this thread today.

The new version includes breaking changes for the config file. Here's an example of the new structure:

machines: 
  - id: runner-1
    source: oci://ghcr.io/cirruslabs/macos-sonoma-xcode:15.4
    provisioner:
      type: script
      config:
        run: echo Hello World
    hardware:
      ramGigabytes: 8
      cpuCores: 4
  - id: runner-2
    source: oci://ghcr.io/cirruslabs/macos-sonoma-xcode:15.4
    provisioner:
      type: script
      config:
        run: echo Hello World
    hardware:
      ramGigabytes: 8
      cpuCores: 4

Marcocanc avatar Aug 26 '24 08:08 Marcocanc

Here's the branch: https://github.com/traderepublic/Cilicon/tree/cilicon-3.0 And the build: Cilicon 3.zip

Marcocanc avatar Aug 26 '24 15:08 Marcocanc

~Just tried out the v3. I am able to get two runners (GitHub) set up but it seems they show up as the same runner in GitHub. Running jobs will cause both runners to restart once a job finished and they never run two things at the same time :D~

~I'm running them both with source pointing to the same .tart vm? so maybe thats the issue?~

Edit: Was able to resolve this by adding a runnerName to the github provisioner config

Looks very promising!

ElectricCookie avatar Sep 03 '24 13:09 ElectricCookie

@ElectricCookie The configuration validation definitely still needs a bit of work. Unfortunately multi-runner Cilicon is very low priority for us at the moment, as we don't need it internally.

Marcocanc avatar Sep 05 '24 07:09 Marcocanc

@Marcocanc would you be open to a PR implementing apple’s https://pkl-lang.org/ as config file? Since v3 will have a breaking change anyway, this might be a good moment to switch to a typed and validated config file 😄

ElectricCookie avatar Sep 05 '24 08:09 ElectricCookie

I'd be open for. We could also try to make it yml compatible (fallback to yml if no pkl is found). The generated code conforms to Decodable anyway. When deploying Cilicon + pkl config, would we have to ship the schema along with it, or is that only required if you want the editor to warn you about a bad config?

Marcocanc avatar Sep 05 '24 12:09 Marcocanc

I think you would amend the template file which you can import via a URL (https://pkl-lang.org/main/current/language-reference/index.html#module-uris). I'm unsure whether you could simply point the the raw.github url of the file in the repo - which would have the benefit of being versioned. I'll have a look at this once I get around to tinkering :)

ElectricCookie avatar Sep 09 '24 13:09 ElectricCookie

The preview of Cilicon 3.0 is great! Took me ~10 minutes to setup and get running, and it's working really well. Thanks for putting the effort into making multiple VMs work :sunflower:

tonyarnold avatar Sep 21 '24 01:09 tonyarnold