Batch size does not match the point.offset.shape[0]
When visualizing ScanNet, I noticed that during training, points from multiple scenes appear to be placed in a single batch. Is this a bug? Will this affect serialization? How can I ensure that only points from a single scene are included in each training batch?
I had problems with batching as well. Apparently the collate function of pointcept works in a weird way. I solved my batching problem by returing data with an extra dimension (of 1) to make the collate create batches the way I wanted. Hope it can help. Sorry for the non super-tecnical answer but I just started working with this framework.
Hi, I'm having issues with the batching as well; the collate is indeed weird. Can you give a brief on your solution?
You mainly have to look at this image. Basically instead of adding an extra dimension for the batch, making your tensor of shape [B, N,C], it stacks everything in the first dimension and uses offset to know where a sample begins/end. So you basically deal with tensors of shape [B*N, C], which is not what you would normally do in a training loop. My suggestion is to just stick to this mechanism and work with the default collate function, otherwise you risk messing stuff up at the beginning. As I said, I started working with this framework like 4 days ago, so I am still in the process of understanding how stuff works. Moreover I have to run the training code on a remote cluster, so It's a little hard to debug code properly. I paste down here the code of the model I am currently using to regress a point on a mesh, so that you can either use it as a reference or point to some potential bugs.
class VoxelBracketPredictor(nn.Module):
def __init__(
self,
backbone,
backbone_out_channels=96,
output_dim=3, # 3D point coordinates
save_predictions=False,
output_dir:str = "output"
):
super().__init__()
self.backbone = build_model(backbone)
self.output_dir = output_dir
self.save_predictions = save_predictions
# Regression head: outputs 3D point coordinates
self.head = nn.Sequential(
nn.Linear(backbone_out_channels, 256),
nn.BatchNorm1d(256),
nn.ReLU(inplace=True),
nn.Dropout(p=0.3),
nn.Linear(256, 128),
nn.BatchNorm1d(128),
nn.ReLU(inplace=True),
nn.Dropout(p=0.3),
nn.Linear(128, output_dim),
)
def _save(self, input_dict: dict, bracket_point_pred: torch.Tensor) -> None:
"""
Writes the predicted points as JSON files.
Each file will be named:
{sample_name}_epoch{N}.json
where N is how many times this function has been called for that sample.
Args:
input_dict (dict): Contains "name" (list of sample names without extension).
bracket_point_pred (torch.Tensor): Predictions of shape [B, 3].
"""
# Initialize a counter the first time this is called
if not hasattr(self, "_save_counter"):
self._save_counter = defaultdict(int)
os.makedirs(self.output_dir, exist_ok=True)
names = input_dict["name"]
preds = bracket_point_pred.detach().cpu().numpy() # convert to numpy for JSON serialization
for name, coords in zip(names, preds):
# Increment epoch counter for this specific sample
self._save_counter[name] += 1
epoch_idx = self._save_counter[name]
# Construct output filename
filename = f"{name}_epoch{epoch_idx}.json"
filepath = os.path.join(self.output_dir, filename)
# Write coordinates to JSON
with open(filepath, "w") as f:
json.dump({"coords": coords.tolist()}, f, indent=4)
def forward(self, input_dict):
point = self.backbone(input_dict)
# Handle Point structure from voxel-based backbones
if isinstance(point, Point):
# Global average pooling across all points
point.feat = segment_csr(
src=point.feat,
indptr=nn.functional.pad(point.offset, (1, 0)),
reduce="mean",
)
feat = point.feat
else:
feat = point
# Predict 3D point
bracket_point_pred = self.head(feat)
# save bracket_point_pred somewhere
# apparently in the out dictionary we only need the loss value?
# If you add
out = {} # Add predictions to output
# Compute MSE loss if ground truth available
if "bracket_point" in input_dict:
# here the "target" tensor is of shape [B*3], while the
# "bracket_point_pred" tensor is of shape [B,3]". This
# causes an error in nn.functional.mse_loss.
# So we need to reshape the target tensor. (I believe is what we need to do)
target = input_dict["bracket_point"].view_as(bracket_point_pred)
if self.save_predictions and not self.training:
self._save(input_dict, bracket_point_pred)
loss = nn.functional.mse_loss(bracket_point_pred, target)
out["loss"] = loss
return out
@theElandor Thank you for helping me clarify the batch design. This is the core of real point cloud learning (as all the point clouds in the real world have an uneven shape). Please let me know any issue that barry you from understanding our architecture
You mainly have to look at this image. Basically instead of adding an extra dimension for the batch, making your tensor of shape [B, N,C], it stacks everything in the first dimension and uses offset to know where a sample begins/end. So you basically deal with tensors of shape [B*N, C], which is not what you would normally do in a training loop. My suggestion is to just stick to this mechanism and work with the default collate function, otherwise you risk messing stuff up at the beginning. As I said, I started working with this framework like 4 days ago, so I am still in the process of understanding how stuff works. Moreover I have to run the training code on a remote cluster, so It's a little hard to debug code properly. I paste down here the code of the model I am currently using to regress a point on a mesh, so that you can either use it as a reference or point to some potential bugs.
class VoxelBracketPredictor(nn.Module): def __init__( self, backbone, backbone_out_channels=96, output_dim=3, # 3D point coordinates save_predictions=False, output_dir:str = "output" ): super().__init__()
self.backbone = build_model(backbone) self.output_dir = output_dir self.save_predictions = save_predictions # Regression head: outputs 3D point coordinates self.head = nn.Sequential( nn.Linear(backbone_out_channels, 256), nn.BatchNorm1d(256), nn.ReLU(inplace=True), nn.Dropout(p=0.3), nn.Linear(256, 128), nn.BatchNorm1d(128), nn.ReLU(inplace=True), nn.Dropout(p=0.3), nn.Linear(128, output_dim), ) def _save(self, input_dict: dict, bracket_point_pred: torch.Tensor) -> None: """ Writes the predicted points as JSON files. Each file will be named: {sample_name}_epoch{N}.json where N is how many times this function has been called for that sample. Args: input_dict (dict): Contains "name" (list of sample names without extension). bracket_point_pred (torch.Tensor): Predictions of shape [B, 3]. """ # Initialize a counter the first time this is called if not hasattr(self, "_save_counter"): self._save_counter = defaultdict(int) os.makedirs(self.output_dir, exist_ok=True) names = input_dict["name"] preds = bracket_point_pred.detach().cpu().numpy() # convert to numpy for JSON serialization for name, coords in zip(names, preds): # Increment epoch counter for this specific sample self._save_counter[name] += 1 epoch_idx = self._save_counter[name] # Construct output filename filename = f"{name}_epoch{epoch_idx}.json" filepath = os.path.join(self.output_dir, filename) # Write coordinates to JSON with open(filepath, "w") as f: json.dump({"coords": coords.tolist()}, f, indent=4) def forward(self, input_dict): point = self.backbone(input_dict) # Handle Point structure from voxel-based backbones if isinstance(point, Point): # Global average pooling across all points point.feat = segment_csr( src=point.feat, indptr=nn.functional.pad(point.offset, (1, 0)), reduce="mean", ) feat = point.feat else: feat = point # Predict 3D point bracket_point_pred = self.head(feat) # save bracket_point_pred somewhere # apparently in the out dictionary we only need the loss value? # If you add out = {} # Add predictions to output # Compute MSE loss if ground truth available if "bracket_point" in input_dict: # here the "target" tensor is of shape [B*3], while the # "bracket_point_pred" tensor is of shape [B,3]". This # causes an error in nn.functional.mse_loss. # So we need to reshape the target tensor. (I believe is what we need to do) target = input_dict["bracket_point"].view_as(bracket_point_pred) if self.save_predictions and not self.training: self._save(input_dict, bracket_point_pred) loss = nn.functional.mse_loss(bracket_point_pred, target) out["loss"] = loss return out
@theElandor Thank you for the detailed explanation and the code example, I really appreciate it!
I completely understand the flat batching design used in Pointcept: points from multiple samples are concatenated into a single tensor of shape [total_N, C], and offset (or indptr) is used to devide sample boundaries.
However, my original concern wasn’t about the format of batching, but about the composition of each batch. While visualizing ScanNet inputs, I initially suspected a visualization error like. To verify, I directly printed the length of offset during training and observed unexpected behavior:
What I’m actually trying to ask is: The length of offset does not match my expectation of batch composition. Specifically: In single-GPU training, I expected offset.shape[0] == batch_size. But the observed lengths deviate from these values. This makes me question whether the data loading pipeline is correctly assigning one scene per sample (i.e., per offset segment).
To clarify my core concern: Does each segment defined by offset[i-1]:offset[i] strictly correspond to exactly one ScanNet scene?
Thank you for your expertise — this would resolve my uncertainty about data integrity.
Hey, we also have a mechanism triggered by mix_prob (usually 0.8), which will mix up point clouds one by one within a batch; maybe this made you confused.
https://github.com/Pointcept/Pointcept/blob/main/pointcept/datasets/utils.py#L75
@Gofinge Thank you for the explanation! Now I understand. Thanks again for providing such a complete and practical project!
class VoxelBracketPredictor(nn.Module):
def __init__(
self,
backbone,
backbone_out_channels=96,
output_dim=3, # 3D point coordinates
save_predictions=False,
output_dir:str = "output"
):
super().__init__()