CAPEv2
CAPEv2 copied to clipboard
`stop()` in `physical.py` fails to re-image VMs when agent crashes
My Setup
I'm using physical machines managed by a FOG server to re-image VMs after each malware analysis. After an analysis completes, CAPE calls the stop() method in physical.py to reset the machine. This method checks the VM state; if it's running, it triggers a deployment task via the FOG server to restore the VM to a clean snapshot.
Here’s the relevant part of the code:
def stop(self, label):
"""Stop a physical machine.
@param label: physical machine name.
@raise CuckooMachineError: if unable to stop.
"""
taskID_Deploy = 0
hostID = 0
## IF AGENT IS CRASHED, THIS CONDITION WOULDN'T BE TRIGGERED
## THE VM WOULDN'T BE RE-IMAGGED
if self._status(label) == self.RUNNING:
log.debug("Rebooting machine: %s", label)
machine = self._get_machine(label)
r_hosts = requests.get(f"http://{self.options.fog.hostname}/fog/host", headers=headers)
hosts = r_hosts.json()["hosts"]
for host in hosts:
if machine.name == host["name"]:
print(f"{host['id']}: {host['name']}")
hostID = host["id"]
r_types = requests.get(f"http://{self.options.fog.hostname}/fog/tasktype", headers=headers)
types = r_types.json()
Current Behavior
When the agent inside the VM crashes, self._status(label) does not return RUNNING. As a result, the VM is skipped and never re-imaged, leaving it in an infected state indefinitely.
# IF THE AGENT CRASHES, THIS CONDITION IS NEVER TRIGGERED,
# AND THE VM WILL NOT BE RE-IMAGED
if self._status(label) == self.RUNNING:
Fix Attempt 1
To work around this, I modified the condition to check if the machine object is returned by self._get_machine(label) instead of relying on self._status(label):
machine = self._get_machine(label)
# if self._status(label) == self.RUNNING:
if machine:
log.debug("Rebooting machine: %s", label)
# machine = self._get_machine(label)
New Problem Introduced
While this workaround successfully initiates re-imaging even when the agent crashes, it appears to cause another issue: machines with agent crashes are no longer used in subsequent analyses. I suspect this is because they are marked as inactive or removed from the machines pool in the SQLAlchemy-backed database.
Hmm well firstly which agent version are you using? It sounds to me like we should fix the crash rather than the behavior upon crashing.
Hmm well firstly which agent version are you using? It sounds to me like we should fix the crash rather than the behavior upon crashing.
Sorry for that delay, I use the latest agent version 0.19