RuntimeError during Gradient Computation with Custom YOLOv7 Model Wrapper in PyTorch for Xplique Object Detection Explainability
I'm working on object detection explainability using the Xplique library with a custom YOLOv7 model analogous to a tutorial designed for ssdlite320_mobilenet_v3_large Google Colab Tutorial.
My goal is to use Xplique's Saliency method to generate explanations for detections made by YOLOv7. However, I'm encountering a RuntimeError related to gradient computation during the explanation phase.
Approach:
To ensure the YOLOv7 model's output tensors retain the grad_fn attribute, I've wrapped the model in a custom PyTorch module, ensuring compatibility with Xplique's requirements:
import torch
import torch.nn as nn
class Ensemble(nn.ModuleList):
def __init__(self):
super(Ensemble, self).__init__()
def forward(self, x, augment=False):
y = []
for module in self:
y.append(module(x, augment=augment)[0]) # Ensure augment is passed correctly
y = torch.cat(y, 1) # nms ensemble
return y, None # inference, train output
# Load the YOLOv7 model and ensure it's ready for inference
model = Ensemble().to(device) # Assuming device is defined
ckpt = torch.load(weights_path, map_location=device) # Assuming weights_path is defined
model.append(ckpt['ema' if ckpt.get('ema') else 'model'].float().fuse().eval())
# Compatibility updates
for m in model.modules():
if type(m) in [nn.Hardswish, nn.LeakyReLU, nn.ReLU, nn.ReLU6, nn.SiLU]:
m.inplace = True # PyTorch 1.7.0 compatibility
elif type(m) is nn.Upsample:
m.recompute_scale_factor = None # PyTorch 1.11.0 compatibility
# Ensure your input tensor is on the same device as the model
visualizable_torch_inputs = visualizable_torch_inputs.to(device)
# Now perform inference
predictions = model(visualizable_torch_inputs)
With the model prepared, I performed inference to obtain predictions, which include the bounding boxes, confidence scores, and class IDs.
Issue:
When attempting to compute explanations using Xplique's Saliency method, the following RuntimeError is raised:
RuntimeError: RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 3, 20, 20, 6]], which is output 0 of SigmoidBackward0, is at version 2; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
This occurs precisely at the line:
explanation = explainer.explain(processed_tf_inputs, man_bounding_box)
Questions:
-
How can I resolve the RuntimeError related to gradient computation when using Xplique with a YOLOv7 model?
-
Are there specific considerations or modifications required to make YOLOv7 compatible with Xplique's explainability methods?
-
Is the issue related to how YOLOv7's outputs are structured or how gradients are being computed within the model or Xplique?
Hey there,
I am gonna investigate it. Could you please provide your code to get: processed_tf_inputs and man_bounding_box?
Also, you said that you use a custom Yolov7. But can you tell me if you have the same error using the original one?
Hello Lucas,
I successfully resolved an issue related to in-place operations within the detect class of the YOLO algorithm. There is my approach in detail to help others facing similar challenges.
Step 1: Load the Model
model = torch.hub.load('/content/drive/MyDrive/yolov7', 'custom', path_to_weights, source='local')
Step 2: Define an Ensemble Class
I created an ensemble of models to manage multiple versions or variations of YOLO models efficiently:
class Ensemble(nn.ModuleList):
def __init__(self):
super(Ensemble, self).__init__()
def forward(self, x, augment=False):
y = []
for module in self:
module_output = module(x.clone(), augment=augment)[0]
y.append(module_output)
aggregated_output = torch.cat(y, 1)
return aggregated_output, None
Then, I initialized and loaded the ensemble model:
model = Ensemble().to(device)
ckpt = torch.load(weights_path, map_location=device)
model.append(ckpt['ema' if ckpt.get('ema') else 'model'].float().fuse().eval())
Step 3: Compatibility Updates
for m in model.modules():
if isinstance(m, (nn.Hardswish, nn.LeakyReLU, nn.ReLU, nn.ReLU6, nn.SiLU)):
m.inplace = True
elif isinstance(m, nn.Upsample):
m.recompute_scale_factor = None
Step 4: Preprocess the Image
crop_size = get_center_crop_size(image)
image_cropped = TF.center_crop(image, crop_size)
image_resized = TF.resize(image_cropped, [640, 640])
torch_image = TF.to_tensor(image_resized)
visualizable_torch_inputs = torch_image.unsqueeze(0)
Step 5: Define Model Wrapper Created a custom wrapper for the YOLO model to handle inference and post-processing:
class ModelWrapper(torch.nn.Module):
def __call__(self, torch_inputs):
predictions = self.model(torch_inputs)
result = model(torch_inputs)
print("Predictions shape before NMS:", result[0].shape)
predictions = non_max_suppression(result[0])
predictions = transform_input(predictions[0])
return torch.stack([self.format_predictions(pred) for pred in [predictions]], dim=0)
transform input function :
def transform_input(input_tensor):
# Assuming input_tensor is of shape [N, 6] and format [x_min, y_min, x_max, y_max, score, class_id]
# Separate the bounding boxes, scores, and class_ids
boxes = input_tensor[:, :4] # First 4 columns are the bounding box coordinates
scores = input_tensor[:, 4] # 5th column is the score
labels = input_tensor[:, 5].long() # 6th column is the class_id, converted to long for PyTorch compatibility
# Wrap boxes, scores, and labels into a dictionary to match the desired output format
output = {
'boxes': boxes,
'scores': scores,
'labels': labels
}
return output
Step 6: Object Detection and Explanation For processed_tf_inputs
# preprocessed inputs for the model
processed_tf_inputs = tf.constant(
visualizable_torch_inputs.permute([0, 2, 3, 1]).cpu().detach().numpy())
Then :
object_detection_model = ModelWrapper(model).eval()
wrapped_model = TorchWrapper(object_detection_model, device, is_channel_first=True)
tf_predictions = wrapped_model(processed_tf_inputs)
man_bounding_box = tf_predictions[:, 0]
Step 7: Saliency Explanation Attempted to generate a saliency explanation, which led to the encountered error:
explainer = Saliency(wrapped_model, operator=xplique.Tasks.OBJECT_DETECTION, batch_size=BATCH_SIZE)
explanation = explainer.explain(processed_tf_inputs, man_bounding_box)
The Traceback error :
I external/local_tsl/tsl/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
/home/wadie/.local/lib/python3.10/site-packages/torch/autograd/__init__.py:251: UserWarning: Error detected in SigmoidBackward0. Traceback of forward call that caused the error:
File "xplique/explain.py", line 318, in <module>
explanation = explainer.explain(processed_tf_inputs, man_bounding_box)
File "/.local/lib/python3.10/site-packages/xplique/attributions/base.py", line 32, in sanitize
return explanation_method(self, inputs, targets, *args)
File "/.local/lib/python3.10/site-packages/xplique/attributions/base.py", line 221, in explain
explanations = explain_method(self, inputs, targets)
File "/.local/lib/python3.10/site-packages/xplique/attributions/saliency.py", line 84, in explain
gradients = self.batch_gradient(self.model, inputs, targets, self.batch_size)
File "/.local/lib/python3.10/site-packages/xplique/commons/operators_operations.py", line 193, in batched_operator
results = tf.concat([
File "/.local/lib/python3.10/site-packages/xplique/commons/operators_operations.py", line 194, in <listcomp>
operator(model, x, y)
File "/.local/lib/python3.10/site-packages/tensorflow/python/util/traceback_utils.py", line 150, in error_handler
return fn(*args, **kwargs)
File "/.local/lib/python3.10/site-packages/tensorflow/python/eager/polymorphic_function/polymorphic_function.py", line 809, in __call__
return self._python_function(*args, **kwds)
File "/.local/lib/python3.10/site-packages/xplique/commons/operators_operations.py", line 168, in gradient
scores = operator(model, inputs, targets)
File "/.local/lib/python3.10/site-packages/tensorflow/python/util/traceback_utils.py", line 150, in error_handler
return fn(*args, **kwargs)
File "/.local/lib/python3.10/site-packages/tensorflow/python/eager/polymorphic_function/polymorphic_function.py", line 809, in __call__
return self._python_function(*args, **kwds)
File "/.local/lib/python3.10/site-packages/xplique/commons/operators.py", line 223, in object_detection_operator
objects = model(inputs)
File "/.local/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, **kwargs)
File "/.local/lib/python3.10/site-packages/keras/src/engine/training.py", line 590, in __call__
return super().__call__(*args, **kwargs)
File "/.local/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, **kwargs)
File "/.local/lib/python3.10/site-packages/keras/src/engine/base_layer.py", line 1149, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "/.local/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 96, in error_handler
return fn(*args, **kwargs)
File "/.local/lib/python3.10/site-packages/tensorflow/python/ops/custom_gradient.py", line 343, in __call__
return self._d(self._f, a, k)
File "/.local/lib/python3.10/site-packages/tensorflow/python/ops/custom_gradient.py", line 297, in decorated
return _eager_mode_decorator(wrapped, args, kwargs)
File "/.local/lib/python3.10/site-packages/tensorflow/python/ops/custom_gradient.py", line 543, in _eager_mode_decorator
result, grad_fn = f(*args, **kwargs)
File "/.local/lib/python3.10/site-packages/xplique/wrappers/pytorch.py", line 92, in call
outputs = self.model(torch_inputs)
File "xplique/explain.py", line 273, in __call__
result = model(torch_inputs)
File "/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "xplique/explain.py", line 201, in forward
y.append(module(x, augment=augment)[0]) # Ensure augment is passed correctly
File "/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "xplique/yolov7/models/yolo.py", line 629, in forward
return self.forward_once(x, profile) # single-scale inference, train
File "xplique/yolov7/models/yolo.py", line 663, in forward_once
x = m(x) # run the layer
File "/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "xplique/yolov7/models/yolo.py", line 60, in forward
y = x[i].sigmoid()
(Triggered internally at ../torch/csrc/autograd/python_anomaly_mode.cpp:114.)
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
Solution : Identified the in-place operation causing the issue in the yolo.py line : 56-57 and modified it to avoid such operations:
from
y[..., 0:2] = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i]
y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]
to
# Modified to avoid in-place operations
xy_update = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i]
wh_update = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]
y = torch.cat((xy_update, wh_update, y[..., 4:]), dim=-1)
This modification resolved the error, allowing gradient computation to proceed without issues.