My Setup

I'm using physical machines managed by a FOG server to re-image VMs after each malware analysis. After an analysis completes, CAPE calls the stop() method in physical.py to reset the machine. This method checks the VM state; if it's running, it triggers a deployment task via the FOG server to restore the VM to a clean snapshot.

Here’s the relevant part of the code:

def stop(self, label):
      """Stop a physical machine.
      @param label: physical machine name.
      @raise CuckooMachineError: if unable to stop.
      """
      taskID_Deploy = 0
      hostID = 0
      
      ## IF AGENT IS CRASHED, THIS CONDITION WOULDN'T BE TRIGGERED
      ## THE VM WOULDN'T BE RE-IMAGGED
      if self._status(label) == self.RUNNING:
          log.debug("Rebooting machine: %s", label)
          machine = self._get_machine(label)

          r_hosts = requests.get(f"http://{self.options.fog.hostname}/fog/host", headers=headers)
          hosts = r_hosts.json()["hosts"]

          for host in hosts:
              if machine.name == host["name"]:
                  print(f"{host['id']}: {host['name']}")
                  hostID = host["id"]
                  r_types = requests.get(f"http://{self.options.fog.hostname}/fog/tasktype", headers=headers)
                  types = r_types.json()

Current Behavior

When the agent inside the VM crashes, self._status(label) does not return RUNNING. As a result, the VM is skipped and never re-imaged, leaving it in an infected state indefinitely.

# IF THE AGENT CRASHES, THIS CONDITION IS NEVER TRIGGERED,
# AND THE VM WILL NOT BE RE-IMAGED
if self._status(label) == self.RUNNING:

Fix Attempt 1

To work around this, I modified the condition to check if the machine object is returned by self._get_machine(label) instead of relying on self._status(label):

machine = self._get_machine(label)

# if self._status(label) == self.RUNNING:
if machine:
    log.debug("Rebooting machine: %s", label)
    # machine = self._get_machine(label)

New Problem Introduced

While this workaround successfully initiates re-imaging even when the agent crashes, it appears to cause another issue: machines with agent crashes are no longer used in subsequent analyses. I suspect this is because they are marked as inactive or removed from the machines pool in the SQLAlchemy-backed database.

May 22 '25 11:05 para0x0dise

Hmm well firstly which agent version are you using? It sounds to me like we should fix the crash rather than the behavior upon crashing.

May 22 '25 12:05 kevoreilly

Hmm well firstly which agent version are you using? It sounds to me like we should fix the crash rather than the behavior upon crashing.

Sorry for that delay, I use the latest agent version 0.19

May 24 '25 19:05 para0x0dise

CAPEv2
CAPEv2 copied to clipboard

`stop()` in `physical.py` fails to re-image VMs when agent crashes

My Setup

Current Behavior

Fix Attempt 1

New Problem Introduced

CAPEv2 CAPEv2 copied to clipboard

`stop()` in `physical.py` fails to re-image VMs when agent crashes

My Setup

Current Behavior

Fix Attempt 1

New Problem Introduced

CAPEv2
CAPEv2 copied to clipboard