kaolin 0.17, torch2.4.0 bakward pass error
I get the following error when using the render function below to render a scene and then deform a set of points and faces to match the scene:
ERROR
File "C:...\lib\site-packages\kaolin\render\mesh\rasterization.py", line 357, in backward face_vertices_image, face_features = ctx.saved_tensors RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 512, 512, 3]], which is output 0 of RasterizeCudaBackward, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
CODE
import open3d as o3d import torch import torch.nn as nn import torch.optim as optim from torchvision.transforms import ToPILImage import numpy as np import kaolin as kal from kaolin.render.camera import perspective_camera from kaolin.ops.coords import spherical2cartesian from kaolin.render.camera import blender_coords from kaolin.render.easy_render import render_mesh, default_lighting from kaolin.render.camera import Camera import matplotlib.pyplot as plt from matplotlib.animation import FuncAnimation from IPython.display import HTML from copy import deepcopy import logging import warnings
logging.getLogger().setLevel(logging.ERROR) warnings.filterwarnings("ignore", category=UserWarning, message="WARNING:root:Missing uvmap; cannot texturemap materials")
def create_camera(azimuth, elevation, device="cuda"): azimuth_tensor = torch.deg2rad(torch.tensor(azimuth, dtype=torch.float32, device=device)) elevation_tensor = torch.deg2rad(torch.tensor(elevation, dtype=torch.float32, device=device)) x, y, z = spherical2cartesian(azimuth=azimuth_tensor, elevation=elevation_tensor, distance=2.0) eye = torch.tensor([x, y, z], dtype=torch.float32, device=device) at = torch.tensor([0.0, 0.0, 0.0], dtype=torch.float32, device=device) forward = -eye / torch.linalg.norm(eye) # Fixed world up vector world_up = torch.tensor([0.0, 0.0, 1.0], dtype=torch.float32, device=device) right = torch.cross(world_up, forward) right = right / torch.linalg.norm(right) # Normalize right vector up = torch.cross(forward, right) # Up vector from forward and right camera = Camera.from_args(eye=eye, at=at, up=up, fov=45, width=512, height=512, device="cuda") camera.extrinsics.change_coordinate_system(blender_coords()) return camera
def create_camera_matrices(): camera_matrices = [] for angle in range(0, 360, 10): camera_matrices.append(create_camera(azimuth=angle, elevation=0)) return camera_matrices
if name=="main":
-
SCENE1 load scene1 from .off with o3d and convert vertices and faces to torch and create SurfaceMesh object to represent this scene
-
SCENE2 initialize a mesh with suficient triangles and vertices e.g., a flat mesh and convert to SurfaceMesh object
-
train :
- make the vertices of SCENE2 trainable,
- use the cameras created by create_camera_matrices()
- in every epoch use all cams to generate target_images from scene1 via render_mesh(camera, SCENE1, lighting) for camera in cameras
- do same to generate source_images
- optimize using the loss_sum of source_images-target_images
Thanks for your interest, @BarakeelFanseuKamhoua. Would you be able to post the code where you construct the scene being optimized, run rendering and do backward computation?
I suspect that your issue might be related to caching of mesh attributes. See "Optimization and Gradients" in this notebook.
I have looked at the notebook recomended but still not solved. Please see a dummy sample of the source_mesh construction and dummy training below
CONSTRUCT SCENE DUMMY
def create_flat_mesh(n): # Generate a grid of points in 2D x = torch.linspace(0, 1, steps=n) y = torch.linspace(0, 1, steps=n) xv, yv = torch.meshgrid(x, y, indexing="ij")
# Flatten the grid into a list of vertices
vertices = torch.stack([xv.flatten(), yv.flatten(), torch.zeros(n * n)], dim=1)
# Create faces (triangles) for the grid
faces = []
for i in range(n - 1):
for j in range(n - 1):
# Vertices of the current cell
v0 = i * n + j
v1 = v0 + 1
v2 = v0 + n
v3 = v2 + 1
# Add two triangles for the cell
faces.append([v0, v1, v2]) # First triangle
faces.append([v1, v3, v2]) # Second triangle
faces = torch.tensor(faces, dtype=torch.int64)
mesh = kal.rep.SurfaceMesh(vertices, faces, allow_auto_compute=True)
mesh.vertices = kal.ops.pointcloud.center_points(mesh.vertices.unsqueeze(0),
normalize=True).squeeze(0)
return mesh.cuda()
source_mesh = create_flat_mesh(n=100): # will have n*n vertices
TRAIN CODE DUMMY
def shape_matching(source_mesh, target_mesh, camera_matrices,
num_epochs=100, lr=0.01):
source_mesh.vertices = source_mesh.vertices.requires_grad_(True)
lighting = default_lighting().to(source_mesh.vertices.device)
optimizer = optim.Adam([source_mesh.vertices], lr=lr)
optimized_store = []
target_store = []
for epoch in range(num_epochs):
optimizer.zero_grad()
loss = 0
total_loss = 0
for camera in camera_matrices:
with torch.jit.optimized_execution(False):
with torch.no_grad():
target_image = render_mesh(camera, target_mesh, lighting)["render"].clamp(0,1)
source_image = render_mesh(camera, source_mesh, lighting)["render"].clamp(0,1)
target_store.append(target_image.squeeze(0).transpose(0, 2).detach())
optimized_store.append(source_image.squeeze(0).transpose(0, 2).detach())
loss = torch.mean((target_image - source_image) ** 2)
total_loss += loss
loss.backward()
optimizer.step()
print(f"Epoch {epoch}, Loss: {total_loss.item()}")
@BarakeelFanseuKamhoua Thanks for the details! I noticed you run the rendering with torch.no_grad(). That is not correct, as you need the gradients in order to optimize.
In addition, you are also calling detach on the output image. That also stops your gradient flow and prevents you from optimizing.
If you remove both those things and receive the same error, please run the following on your mesh being optimized right before and right after the first rendering and paste the output here:
# Shows detailed attributes of all tensors, such as requires_grad and if it's a leaf node
print(f'\nDetailed string of {mesh.to_string(detailed=True, print_stats=True)}')
Tip: I would suggest first writing a trivial optimization loop on one of the mesh attributes, without even any rendering, and making sure the loss can go down (e.g. vertex distance to a point). That way you can isolate if it is the rendering that is causing the issue, or there is an error in the training loop itself.
Hello, Thanks again for your quick responses. Removing the torch.no_grad() did not solve the problem. So I printed details of before and after. Please find below:
Before rendering source_mesh
Before rendering; Detailed string of SurfaceMesh object with batching strategy NONE
vertices: [10000, 3] (torch.float32)[cuda:0] - [min -0.5000, max 0.5000, mean 0.0123] - req_grad=True, is_leaf=True, layout=torch.strided
faces: [19602, 3] (torch.int64)[cuda:0] - [min 0.0000, max 9999.0000, mean 4999.5000] - req_grad=False, is_leaf=True, layout=torch.strided
face_vertices: if possible, computed on access from: (faces, vertices)
face_normals: if possible, computed on access from: (normals, face_normals_idx) or (vertex_normals, faces) or (vertices, faces)
face_uvs: if possible, computed on access from: (uvs, face_uvs_idx)
vertex_normals: if possible, computed on access from: (faces, face_normals)
vertex_tangents: if possible, computed on access from: (faces, face_vertices, face_uvs, vertex_normals)
vertex_colors: if possible, computed on access from: (faces, face_colors)
vertex_features: if possible, computed on access from: (faces, face_features)
face_tangents: if possible, computed on access from: (faces, vertex_tangents)
face_colors: if possible, computed on access from: (faces, vertex_colors)
face_features: if possible, computed on access from: (faces, vertex_features)
After rendering source_mesh
After rendering; Detailed string of SurfaceMesh object with batching strategy NONE
vertices: [10000, 3] (torch.float32)[cuda:0] - [min -0.5000, max 0.5000, mean 0.0123] - req_grad=True, is_leaf=True, layout=torch.strided
faces: [19602, 3] (torch.int64)[cuda:0] - [min 0.0000, max 9999.0000, mean 4999.5000] - req_grad=False, is_leaf=True, layout=torch.strided
face_vertices: if possible, computed on access from: (faces, vertices)
face_normals: if possible, computed on access from: (normals, face_normals_idx) or (vertex_normals, faces) or (vertices, faces)
face_uvs: if possible, computed on access from: (uvs, face_uvs_idx)
vertex_normals: if possible, computed on access from: (faces, face_normals)
vertex_tangents: if possible, computed on access from: (faces, face_vertices, face_uvs, vertex_normals)
vertex_colors: if possible, computed on access from: (faces, face_colors)
vertex_features: if possible, computed on access from: (faces, face_features)
face_tangents: if possible, computed on access from: (faces, vertex_tangents)
face_colors: if possible, computed on access from: (faces, vertex_colors)
face_features: if possible, computed on access from: (faces, vertex_features)
Thank you for checking. I can reproduce this, looks like a bug. @Caenorst can you help look into this?
I faced the same problem, and this line yields the error https://github.com/NVIDIAGameWorks/kaolin/blob/b6cf8073edbe9b5469f643d3b9cf57b0f43651db/kaolin/render/easy_render/mesh.py#L106
I modified this to
im_base_normals = im_base_normals * im_normal_sign.unsqueeze(-1)
```,
and it resolved the problem
@shumash
Stale issue, please reopen if still relevant