Add Native 3-D Volume Training Mode and 3-D + T mask stitching to Cellpose-SAM

Open derekthirstrup opened this issue 7 months ago • 0 comments

Cellpose-SAM excels at 3-D inference, but training still expects 2-D slices.
This forces users to:

pre-slice every z-stack, losing volumetric anisotropic sampling context for learning model flow embeddings
manage pixel anisotropic sampling by hand
write extra stitching code after fine-tuning a model and slows model training iterations

Describe the solution you’d like

A --train_3d flag (CLI + GUI) that:
1. Accepts (Z, Y, X) or (C, Z, Y, X) volumes + masks
2. Uses a 3-D variant of the Cellpose head (3-D flow + distance outputs)
3. Handles voxel anisotropy internally
4. Supports patch-based sampling & mixed-precision for GPU efficiency on consumer hw such as 4090 or 5090

3-D + T mask stitching for time-lapse tracking

Cellpose already exposes a rudimentary stitching switch (stitch_threshold > 0) that merges spatially-adjacent masks across tiles. The new --train_3d workflow should extend that idea to temporal stitching so a single instance ID can be followed through successive volumes (Z, Y, X, T):

Input layout –‐ stacks shaped (T, Z, Y, X) plus matching (T, Z, Y, X) label masks.
Usage –‐ python -m cellpose ... --train_3d --stitch_3dt --stitch_threshold 0.7
- --stitch_3dt activates both spatial and temporal stitching.
- --stitch_threshold (0–1) is the IoU cutoff used to merge a mask in frame t with the nearest mask in frame t + 1.
Algorithm –‐ in every time step, compute pairwise 3-D IoU between masks at t and masks at t + 1.
- Build a bipartite graph and solve a max-IoU assignment (Hungarian or greedy) subject to the threshold.
- Propagate the parent ID forward; create a new ID if no match is found.
- Optionally fill gaps ≤ gap_max frames with linear interpolation of centroids.
Outputs –‐ saves a (Z, Y, X, T) mask stack where each nucleus keeps the same label across time, plus a .csv track table:

track_id first_frame last_frame mean_volume …
Why integrate now –‐ scientists training 3-D models usually work with time-lapse data (e.g. spheroid growth, embryo development). Training and tracking in the same UI lowers friction and guarantees mask compatibility.

Jun 02 '25 16:06 derekthirstrup