xplique icon indicating copy to clipboard operation
xplique copied to clipboard

RuntimeError during Gradient Computation with Custom YOLOv7 Model Wrapper in PyTorch for Xplique Object Detection Explainability

Open wadie999 opened this issue 2 years ago • 2 comments

I'm working on object detection explainability using the Xplique library with a custom YOLOv7 model analogous to a tutorial designed for ssdlite320_mobilenet_v3_large Google Colab Tutorial.

My goal is to use Xplique's Saliency method to generate explanations for detections made by YOLOv7. However, I'm encountering a RuntimeError related to gradient computation during the explanation phase.

Approach:

To ensure the YOLOv7 model's output tensors retain the grad_fn attribute, I've wrapped the model in a custom PyTorch module, ensuring compatibility with Xplique's requirements:

import torch
import torch.nn as nn

class Ensemble(nn.ModuleList):
    def __init__(self):
        super(Ensemble, self).__init__()

    def forward(self, x, augment=False):
        y = []
        for module in self:
            y.append(module(x, augment=augment)[0])  # Ensure augment is passed correctly
        y = torch.cat(y, 1)  # nms ensemble
        return y, None  # inference, train output

# Load the YOLOv7 model and ensure it's ready for inference
model = Ensemble().to(device)  # Assuming device is defined
ckpt = torch.load(weights_path, map_location=device)  # Assuming weights_path is defined
model.append(ckpt['ema' if ckpt.get('ema') else 'model'].float().fuse().eval())

# Compatibility updates
for m in model.modules():
    if type(m) in [nn.Hardswish, nn.LeakyReLU, nn.ReLU, nn.ReLU6, nn.SiLU]:
        m.inplace = True  # PyTorch 1.7.0 compatibility
    elif type(m) is nn.Upsample:
        m.recompute_scale_factor = None  # PyTorch 1.11.0 compatibility

# Ensure your input tensor is on the same device as the model
visualizable_torch_inputs = visualizable_torch_inputs.to(device)

# Now perform inference
predictions = model(visualizable_torch_inputs)

With the model prepared, I performed inference to obtain predictions, which include the bounding boxes, confidence scores, and class IDs.

Issue:

When attempting to compute explanations using Xplique's Saliency method, the following RuntimeError is raised:

RuntimeError: RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 3, 20, 20, 6]], which is output 0 of SigmoidBackward0, is at version 2; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

This occurs precisely at the line:

explanation = explainer.explain(processed_tf_inputs, man_bounding_box)

Questions:

  • How can I resolve the RuntimeError related to gradient computation when using Xplique with a YOLOv7 model?

  • Are there specific considerations or modifications required to make YOLOv7 compatible with Xplique's explainability methods?

  • Is the issue related to how YOLOv7's outputs are structured or how gradients are being computed within the model or Xplique?

wadie999 avatar Feb 22 '24 23:02 wadie999

Hey there,

I am gonna investigate it. Could you please provide your code to get: processed_tf_inputs and man_bounding_box? Also, you said that you use a custom Yolov7. But can you tell me if you have the same error using the original one?

lucashervier avatar Feb 29 '24 13:02 lucashervier

Hello Lucas,

I successfully resolved an issue related to in-place operations within the detect class of the YOLO algorithm. There is my approach in detail to help others facing similar challenges.

Step 1: Load the Model

model = torch.hub.load('/content/drive/MyDrive/yolov7', 'custom', path_to_weights, source='local')

Step 2: Define an Ensemble Class

I created an ensemble of models to manage multiple versions or variations of YOLO models efficiently:

class Ensemble(nn.ModuleList):
    def __init__(self):
        super(Ensemble, self).__init__()

    def forward(self, x, augment=False):
        y = []
        for module in self:
            module_output = module(x.clone(), augment=augment)[0]
            y.append(module_output)
        aggregated_output = torch.cat(y, 1)
        return aggregated_output, None

Then, I initialized and loaded the ensemble model:


model = Ensemble().to(device)
ckpt = torch.load(weights_path, map_location=device)
model.append(ckpt['ema' if ckpt.get('ema') else 'model'].float().fuse().eval())


Step 3: Compatibility Updates

for m in model.modules():
    if isinstance(m, (nn.Hardswish, nn.LeakyReLU, nn.ReLU, nn.ReLU6, nn.SiLU)):
        m.inplace = True
    elif isinstance(m, nn.Upsample):
        m.recompute_scale_factor = None

Step 4: Preprocess the Image


crop_size = get_center_crop_size(image)
image_cropped = TF.center_crop(image, crop_size)
image_resized = TF.resize(image_cropped, [640, 640])
torch_image = TF.to_tensor(image_resized)
visualizable_torch_inputs = torch_image.unsqueeze(0)

Step 5: Define Model Wrapper Created a custom wrapper for the YOLO model to handle inference and post-processing:

class ModelWrapper(torch.nn.Module):
    def __call__(self, torch_inputs):
        predictions = self.model(torch_inputs)
        result = model(torch_inputs)
        print("Predictions shape before NMS:", result[0].shape)
        predictions = non_max_suppression(result[0])
        predictions = transform_input(predictions[0])
        return torch.stack([self.format_predictions(pred) for pred in [predictions]], dim=0)

transform input function :

def transform_input(input_tensor):
    # Assuming input_tensor is of shape [N, 6] and format [x_min, y_min, x_max, y_max, score, class_id]

    # Separate the bounding boxes, scores, and class_ids
    boxes = input_tensor[:, :4]  # First 4 columns are the bounding box coordinates
    scores = input_tensor[:, 4]  # 5th column is the score
    labels = input_tensor[:, 5].long()  # 6th column is the class_id, converted to long for PyTorch compatibility

    # Wrap boxes, scores, and labels into a dictionary to match the desired output format
    output = {
        'boxes': boxes,
        'scores': scores,
        'labels': labels
    }

    return output

Step 6: Object Detection and Explanation For processed_tf_inputs

# preprocessed inputs for the model
processed_tf_inputs = tf.constant(
    visualizable_torch_inputs.permute([0, 2, 3, 1]).cpu().detach().numpy())

Then :

object_detection_model = ModelWrapper(model).eval()
wrapped_model = TorchWrapper(object_detection_model, device, is_channel_first=True)
tf_predictions = wrapped_model(processed_tf_inputs)
man_bounding_box = tf_predictions[:, 0]

Step 7: Saliency Explanation Attempted to generate a saliency explanation, which led to the encountered error:

explainer = Saliency(wrapped_model, operator=xplique.Tasks.OBJECT_DETECTION, batch_size=BATCH_SIZE)
explanation = explainer.explain(processed_tf_inputs, man_bounding_box)

The Traceback error :

I external/local_tsl/tsl/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
/home/wadie/.local/lib/python3.10/site-packages/torch/autograd/__init__.py:251: UserWarning: Error detected in SigmoidBackward0. Traceback of forward call that caused the error:
  File "xplique/explain.py", line 318, in <module>
    explanation = explainer.explain(processed_tf_inputs, man_bounding_box)
  File "/.local/lib/python3.10/site-packages/xplique/attributions/base.py", line 32, in sanitize
    return explanation_method(self, inputs, targets, *args)
  File "/.local/lib/python3.10/site-packages/xplique/attributions/base.py", line 221, in explain
    explanations = explain_method(self, inputs, targets)
  File "/.local/lib/python3.10/site-packages/xplique/attributions/saliency.py", line 84, in explain
    gradients = self.batch_gradient(self.model, inputs, targets, self.batch_size)
  File "/.local/lib/python3.10/site-packages/xplique/commons/operators_operations.py", line 193, in batched_operator
    results = tf.concat([
  File "/.local/lib/python3.10/site-packages/xplique/commons/operators_operations.py", line 194, in <listcomp>
    operator(model, x, y)
  File "/.local/lib/python3.10/site-packages/tensorflow/python/util/traceback_utils.py", line 150, in error_handler
    return fn(*args, **kwargs)
  File "/.local/lib/python3.10/site-packages/tensorflow/python/eager/polymorphic_function/polymorphic_function.py", line 809, in __call__
    return self._python_function(*args, **kwds)
  File "/.local/lib/python3.10/site-packages/xplique/commons/operators_operations.py", line 168, in gradient
    scores = operator(model, inputs, targets)
  File "/.local/lib/python3.10/site-packages/tensorflow/python/util/traceback_utils.py", line 150, in error_handler
    return fn(*args, **kwargs)
  File "/.local/lib/python3.10/site-packages/tensorflow/python/eager/polymorphic_function/polymorphic_function.py", line 809, in __call__
    return self._python_function(*args, **kwds)
  File "/.local/lib/python3.10/site-packages/xplique/commons/operators.py", line 223, in object_detection_operator
    objects = model(inputs)
  File "/.local/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler
    return fn(*args, **kwargs)
  File "/.local/lib/python3.10/site-packages/keras/src/engine/training.py", line 590, in __call__
    return super().__call__(*args, **kwargs)
  File "/.local/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler
    return fn(*args, **kwargs)
  File "/.local/lib/python3.10/site-packages/keras/src/engine/base_layer.py", line 1149, in __call__
    outputs = call_fn(inputs, *args, **kwargs)
  File "/.local/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 96, in error_handler
    return fn(*args, **kwargs)
  File "/.local/lib/python3.10/site-packages/tensorflow/python/ops/custom_gradient.py", line 343, in __call__
    return self._d(self._f, a, k)
  File "/.local/lib/python3.10/site-packages/tensorflow/python/ops/custom_gradient.py", line 297, in decorated
    return _eager_mode_decorator(wrapped, args, kwargs)
  File "/.local/lib/python3.10/site-packages/tensorflow/python/ops/custom_gradient.py", line 543, in _eager_mode_decorator
    result, grad_fn = f(*args, **kwargs)
  File "/.local/lib/python3.10/site-packages/xplique/wrappers/pytorch.py", line 92, in call
    outputs = self.model(torch_inputs)
  File "xplique/explain.py", line 273, in __call__
    result = model(torch_inputs)
  File "/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "xplique/explain.py", line 201, in forward
    y.append(module(x, augment=augment)[0])  # Ensure augment is passed correctly
  File "/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "xplique/yolov7/models/yolo.py", line 629, in forward
    return self.forward_once(x, profile)  # single-scale inference, train
  File "xplique/yolov7/models/yolo.py", line 663, in forward_once
    x = m(x)  # run the layer
  File "/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "xplique/yolov7/models/yolo.py", line 60, in forward
    y = x[i].sigmoid()
 (Triggered internally at ../torch/csrc/autograd/python_anomaly_mode.cpp:114.)
  Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass

Solution : Identified the in-place operation causing the issue in the yolo.py line : 56-57 and modified it to avoid such operations:

from

y[..., 0:2] = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i]
y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]

to

# Modified to avoid in-place operations
xy_update = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i]
wh_update = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]
y = torch.cat((xy_update, wh_update, y[..., 4:]), dim=-1)

This modification resolved the error, allowing gradient computation to proceed without issues.

wadie999 avatar Feb 29 '24 21:02 wadie999