CAPEv2 icon indicating copy to clipboard operation
CAPEv2 copied to clipboard

`stop()` in `physical.py` fails to re-image VMs when agent crashes

Open para0x0dise opened this issue 6 months ago • 2 comments

My Setup

I'm using physical machines managed by a FOG server to re-image VMs after each malware analysis. After an analysis completes, CAPE calls the stop() method in physical.py to reset the machine. This method checks the VM state; if it's running, it triggers a deployment task via the FOG server to restore the VM to a clean snapshot.

Here’s the relevant part of the code:

def stop(self, label):
      """Stop a physical machine.
      @param label: physical machine name.
      @raise CuckooMachineError: if unable to stop.
      """
      taskID_Deploy = 0
      hostID = 0
      
      ## IF AGENT IS CRASHED, THIS CONDITION WOULDN'T BE TRIGGERED
      ## THE VM WOULDN'T BE RE-IMAGGED
      if self._status(label) == self.RUNNING:
          log.debug("Rebooting machine: %s", label)
          machine = self._get_machine(label)

          r_hosts = requests.get(f"http://{self.options.fog.hostname}/fog/host", headers=headers)
          hosts = r_hosts.json()["hosts"]

          for host in hosts:
              if machine.name == host["name"]:
                  print(f"{host['id']}: {host['name']}")
                  hostID = host["id"]
                  r_types = requests.get(f"http://{self.options.fog.hostname}/fog/tasktype", headers=headers)
                  types = r_types.json()

Current Behavior

When the agent inside the VM crashes, self._status(label) does not return RUNNING. As a result, the VM is skipped and never re-imaged, leaving it in an infected state indefinitely.

# IF THE AGENT CRASHES, THIS CONDITION IS NEVER TRIGGERED,
# AND THE VM WILL NOT BE RE-IMAGED
if self._status(label) == self.RUNNING:

Fix Attempt 1

To work around this, I modified the condition to check if the machine object is returned by self._get_machine(label) instead of relying on self._status(label):

machine = self._get_machine(label)

# if self._status(label) == self.RUNNING:
if machine:
    log.debug("Rebooting machine: %s", label)
    # machine = self._get_machine(label)

New Problem Introduced

While this workaround successfully initiates re-imaging even when the agent crashes, it appears to cause another issue: machines with agent crashes are no longer used in subsequent analyses. I suspect this is because they are marked as inactive or removed from the machines pool in the SQLAlchemy-backed database.

para0x0dise avatar May 22 '25 11:05 para0x0dise

Hmm well firstly which agent version are you using? It sounds to me like we should fix the crash rather than the behavior upon crashing.

kevoreilly avatar May 22 '25 12:05 kevoreilly

Hmm well firstly which agent version are you using? It sounds to me like we should fix the crash rather than the behavior upon crashing.

Sorry for that delay, I use the latest agent version 0.19

para0x0dise avatar May 24 '25 19:05 para0x0dise