cellpose icon indicating copy to clipboard operation
cellpose copied to clipboard

Add Native 3-D Volume Training Mode and 3-D + T mask stitching to Cellpose-SAM

Open derekthirstrup opened this issue 7 months ago • 0 comments

Cellpose-SAM excels at 3-D inference, but training still expects 2-D slices.
This forces users to:

  • pre-slice every z-stack, losing volumetric anisotropic sampling context for learning model flow embeddings
  • manage pixel anisotropic sampling by hand
  • write extra stitching code after fine-tuning a model and slows model training iterations

Describe the solution you’d like

  • A --train_3d flag (CLI + GUI) that:
    1. Accepts (Z, Y, X) or (C, Z, Y, X) volumes + masks
    2. Uses a 3-D variant of the Cellpose head (3-D flow + distance outputs)
    3. Handles voxel anisotropy internally
    4. Supports patch-based sampling & mixed-precision for GPU efficiency on consumer hw such as 4090 or 5090

3-D + T mask stitching for time-lapse tracking

Cellpose already exposes a rudimentary stitching switch (stitch_threshold > 0) that merges spatially-adjacent masks across tiles. The new --train_3d workflow should extend that idea to temporal stitching so a single instance ID can be followed through successive volumes (Z, Y, X, T):

  • Input layout –‐ stacks shaped (T, Z, Y, X) plus matching (T, Z, Y, X) label masks.

  • Usage –‐ python -m cellpose ... --train_3d --stitch_3dt --stitch_threshold 0.7

    • --stitch_3dt activates both spatial and temporal stitching.
    • --stitch_threshold (0–1) is the IoU cutoff used to merge a mask in frame t with the nearest mask in frame t + 1.
  • Algorithm –‐ in every time step, compute pairwise 3-D IoU between masks at t and masks at t + 1.

    • Build a bipartite graph and solve a max-IoU assignment (Hungarian or greedy) subject to the threshold.
    • Propagate the parent ID forward; create a new ID if no match is found.
    • Optionally fill gaps ≤ gap_max frames with linear interpolation of centroids.
  • Outputs –‐ saves a (Z, Y, X, T) mask stack where each nucleus keeps the same label across time, plus a .csv track table:

    track_id first_frame last_frame mean_volume
  • Why integrate now –‐ scientists training 3-D models usually work with time-lapse data (e.g. spheroid growth, embryo development). Training and tracking in the same UI lowers friction and guarantees mask compatibility.

derekthirstrup avatar Jun 02 '25 16:06 derekthirstrup