Pointcept icon indicating copy to clipboard operation
Pointcept copied to clipboard

Batch size does not match the point.offset.shape[0]

Open gitKincses opened this issue 3 months ago • 3 comments

When visualizing ScanNet, I noticed that during training, points from multiple scenes appear to be placed in a single batch. Is this a bug? Will this affect serialization? How can I ensure that only points from a single scene are included in each training batch?

Image

gitKincses avatar Oct 04 '25 12:10 gitKincses

I had problems with batching as well. Apparently the collate function of pointcept works in a weird way. I solved my batching problem by returing data with an extra dimension (of 1) to make the collate create batches the way I wanted. Hope it can help. Sorry for the non super-tecnical answer but I just started working with this framework.

theElandor avatar Oct 22 '25 14:10 theElandor

Hi, I'm having issues with the batching as well; the collate is indeed weird. Can you give a brief on your solution?

GeorgeVJose avatar Oct 24 '25 01:10 GeorgeVJose

You mainly have to look at this image. Basically instead of adding an extra dimension for the batch, making your tensor of shape [B, N,C], it stacks everything in the first dimension and uses offset to know where a sample begins/end. So you basically deal with tensors of shape [B*N, C], which is not what you would normally do in a training loop. My suggestion is to just stick to this mechanism and work with the default collate function, otherwise you risk messing stuff up at the beginning. As I said, I started working with this framework like 4 days ago, so I am still in the process of understanding how stuff works. Moreover I have to run the training code on a remote cluster, so It's a little hard to debug code properly. I paste down here the code of the model I am currently using to regress a point on a mesh, so that you can either use it as a reference or point to some potential bugs.

Image

class VoxelBracketPredictor(nn.Module):        
    def __init__(  
        self,
        backbone,  
        backbone_out_channels=96,  
        output_dim=3,  # 3D point coordinates
        save_predictions=False, 
        output_dir:str = "output"
    ):  
        super().__init__()  
          
        self.backbone = build_model(backbone)  
        self.output_dir = output_dir
        self.save_predictions = save_predictions

        # Regression head: outputs 3D point coordinates  
        self.head = nn.Sequential(  
            nn.Linear(backbone_out_channels, 256),  
            nn.BatchNorm1d(256),  
            nn.ReLU(inplace=True),  
            nn.Dropout(p=0.3),  
            nn.Linear(256, 128),  
            nn.BatchNorm1d(128),  
            nn.ReLU(inplace=True),  
            nn.Dropout(p=0.3),  
            nn.Linear(128, output_dim),
        )  
        
    def _save(self, input_dict: dict, bracket_point_pred: torch.Tensor) -> None:
        """
        Writes the predicted points as JSON files.

        Each file will be named:
            {sample_name}_epoch{N}.json
        where N is how many times this function has been called for that sample.

        Args:
            input_dict (dict): Contains "name" (list of sample names without extension).
            bracket_point_pred (torch.Tensor): Predictions of shape [B, 3].
        """
        # Initialize a counter the first time this is called
        if not hasattr(self, "_save_counter"):
            self._save_counter = defaultdict(int)

        os.makedirs(self.output_dir, exist_ok=True)

        names = input_dict["name"]
        preds = bracket_point_pred.detach().cpu().numpy()  # convert to numpy for JSON serialization

        for name, coords in zip(names, preds):
            # Increment epoch counter for this specific sample
            self._save_counter[name] += 1
            epoch_idx = self._save_counter[name]

            # Construct output filename
            filename = f"{name}_epoch{epoch_idx}.json"
            filepath = os.path.join(self.output_dir, filename)

            # Write coordinates to JSON
            with open(filepath, "w") as f:
                json.dump({"coords": coords.tolist()}, f, indent=4)


    def forward(self, input_dict):
        point = self.backbone(input_dict)
        # Handle Point structure from voxel-based backbones    
        if isinstance(point, Point):    
            # Global average pooling across all points    
            point.feat = segment_csr(    
                src=point.feat,
                indptr=nn.functional.pad(point.offset, (1, 0)),    
                reduce="mean",
            )    
            feat = point.feat    
        else:
            feat = point    
     
        # Predict 3D point    
        bracket_point_pred = self.head(feat)    
        # save bracket_point_pred somewhere
        # apparently in the out dictionary we only need the loss value?
        # If you add   
        out = {} # Add predictions to output

        # Compute MSE loss if ground truth available    
        if "bracket_point" in input_dict:
            # here the "target" tensor is of shape [B*3], while the
            # "bracket_point_pred" tensor is of shape [B,3]". This
            # causes an error in nn.functional.mse_loss.
            # So we need to reshape the target tensor. (I believe is what we need to do)
            target = input_dict["bracket_point"].view_as(bracket_point_pred)
            if self.save_predictions and not self.training:
                self._save(input_dict, bracket_point_pred)
            loss = nn.functional.mse_loss(bracket_point_pred, target)
            out["loss"] = loss

        return out

theElandor avatar Oct 24 '25 07:10 theElandor

@theElandor Thank you for helping me clarify the batch design. This is the core of real point cloud learning (as all the point clouds in the real world have an uneven shape). Please let me know any issue that barry you from understanding our architecture

Gofinge avatar Nov 24 '25 08:11 Gofinge

You mainly have to look at this image. Basically instead of adding an extra dimension for the batch, making your tensor of shape [B, N,C], it stacks everything in the first dimension and uses offset to know where a sample begins/end. So you basically deal with tensors of shape [B*N, C], which is not what you would normally do in a training loop. My suggestion is to just stick to this mechanism and work with the default collate function, otherwise you risk messing stuff up at the beginning. As I said, I started working with this framework like 4 days ago, so I am still in the process of understanding how stuff works. Moreover I have to run the training code on a remote cluster, so It's a little hard to debug code properly. I paste down here the code of the model I am currently using to regress a point on a mesh, so that you can either use it as a reference or point to some potential bugs.

Image class VoxelBracketPredictor(nn.Module): def __init__( self, backbone, backbone_out_channels=96, output_dim=3, # 3D point coordinates save_predictions=False, output_dir:str = "output" ): super().__init__()
    self.backbone = build_model(backbone)  
    self.output_dir = output_dir
    self.save_predictions = save_predictions

    # Regression head: outputs 3D point coordinates  
    self.head = nn.Sequential(  
        nn.Linear(backbone_out_channels, 256),  
        nn.BatchNorm1d(256),  
        nn.ReLU(inplace=True),  
        nn.Dropout(p=0.3),  
        nn.Linear(256, 128),  
        nn.BatchNorm1d(128),  
        nn.ReLU(inplace=True),  
        nn.Dropout(p=0.3),  
        nn.Linear(128, output_dim),
    )  
    
def _save(self, input_dict: dict, bracket_point_pred: torch.Tensor) -> None:
    """
    Writes the predicted points as JSON files.

    Each file will be named:
        {sample_name}_epoch{N}.json
    where N is how many times this function has been called for that sample.

    Args:
        input_dict (dict): Contains "name" (list of sample names without extension).
        bracket_point_pred (torch.Tensor): Predictions of shape [B, 3].
    """
    # Initialize a counter the first time this is called
    if not hasattr(self, "_save_counter"):
        self._save_counter = defaultdict(int)

    os.makedirs(self.output_dir, exist_ok=True)

    names = input_dict["name"]
    preds = bracket_point_pred.detach().cpu().numpy()  # convert to numpy for JSON serialization

    for name, coords in zip(names, preds):
        # Increment epoch counter for this specific sample
        self._save_counter[name] += 1
        epoch_idx = self._save_counter[name]

        # Construct output filename
        filename = f"{name}_epoch{epoch_idx}.json"
        filepath = os.path.join(self.output_dir, filename)

        # Write coordinates to JSON
        with open(filepath, "w") as f:
            json.dump({"coords": coords.tolist()}, f, indent=4)


def forward(self, input_dict):
    point = self.backbone(input_dict)
    # Handle Point structure from voxel-based backbones    
    if isinstance(point, Point):    
        # Global average pooling across all points    
        point.feat = segment_csr(    
            src=point.feat,
            indptr=nn.functional.pad(point.offset, (1, 0)),    
            reduce="mean",
        )    
        feat = point.feat    
    else:
        feat = point    
 
    # Predict 3D point    
    bracket_point_pred = self.head(feat)    
    # save bracket_point_pred somewhere
    # apparently in the out dictionary we only need the loss value?
    # If you add   
    out = {} # Add predictions to output

    # Compute MSE loss if ground truth available    
    if "bracket_point" in input_dict:
        # here the "target" tensor is of shape [B*3], while the
        # "bracket_point_pred" tensor is of shape [B,3]". This
        # causes an error in nn.functional.mse_loss.
        # So we need to reshape the target tensor. (I believe is what we need to do)
        target = input_dict["bracket_point"].view_as(bracket_point_pred)
        if self.save_predictions and not self.training:
            self._save(input_dict, bracket_point_pred)
        loss = nn.functional.mse_loss(bracket_point_pred, target)
        out["loss"] = loss

    return out

@theElandor Thank you for the detailed explanation and the code example, I really appreciate it!

I completely understand the flat batching design used in Pointcept: points from multiple samples are concatenated into a single tensor of shape [total_N, C], and offset (or indptr) is used to devide sample boundaries.

However, my original concern wasn’t about the format of batching, but about the composition of each batch. While visualizing ScanNet inputs, I initially suspected a visualization error like. To verify, I directly printed the length of offset during training and observed unexpected behavior:

What I’m actually trying to ask is: The length of offset does not match my expectation of batch composition. Specifically: In single-GPU training, I expected offset.shape[0] == batch_size. But the observed lengths deviate from these values. This makes me question whether the data loading pipeline is correctly assigning one scene per sample (i.e., per offset segment).

To clarify my core concern: Does each segment defined by offset[i-1]:offset[i] strictly correspond to exactly one ScanNet scene?

Thank you for your expertise — this would resolve my uncertainty about data integrity.

gitKincses avatar Nov 24 '25 10:11 gitKincses

Hey, we also have a mechanism triggered by mix_prob (usually 0.8), which will mix up point clouds one by one within a batch; maybe this made you confused. https://github.com/Pointcept/Pointcept/blob/main/pointcept/datasets/utils.py#L75

Gofinge avatar Nov 30 '25 01:11 Gofinge

@Gofinge Thank you for the explanation! Now I understand. Thanks again for providing such a complete and practical project!

gitKincses avatar Dec 02 '25 11:12 gitKincses