Blank output after fine-tuning CPSAM model
I am fine-tuning Cellpose-SAM (cpsam) with my data. But the output seems to be completely blank. The same data, after fine-tuning Cyto3 or Cyto2 models, worked pretty well.
The code snippets below
model = models.CellposeModel(model_type="cpsam", gpu=True)
train.train_seg(
model.net,
train_data=images,
train_labels=labels,
normalize=True,
weight_decay=1e-4,
SGD=False,
learning_rate=initial_learning_rate,
n_epochs=n_epochs,
model_name=model_name
)
I have the same question
Hi @souryasengupta I'm going to need more detail to understand the issue. Please share a complete, minimal code example. Also share the logs of your losses.
This is my complete log. I omitted the training progress tqdm.
nohup: ignoring input
INFO:cellpose.io:not all flows are present, running flow generation for all images
INFO:cellpose.io:614 / 614 images in /home/souryas2/cellpose/cellpose_training_final/all_data/test_folder_qpi_live_rgb_0520 folder have labels
INFO:cellpose.io:not all flows are present, running flow generation for all images
INFO:cellpose.io:614 / 614 images in /home/souryas2/cellpose/cellpose_training_final/all_data/test_folder_qpi_live_rgb_0520 folder have labels
WARNING:cellpose.models:model_type argument is not used in v4.0.1+. Ignoring this argument...
INFO:cellpose.core:** TORCH CUDA version installed and working. **
INFO:cellpose.core:>>>> using GPU (CUDA)
INFO:cellpose.models:>>>> loading model /home/souryas2/.cellpose/models/cpsam
INFO:cellpose.dynamics:computing flows for labels
Welcome to CellposeSAM, cellpose v
cellpose version: 4.0.3
platform: linux
python version: 3.9.21
torch version: 2.6.0+cu124! The neural network component of
CPSAM is much larger than in previous versions and CPU excution is slow.
We encourage users to use GPU/MPS if available.
Using GPU ID: 0
✅ Data augmentation is enabled
INFO:cellpose.train:>>> computing diameters
1695.00it/s]/home/souryas2/miniconda3/envs/cellpose_env4/lib/python3.9/site-packages/numpy/_core/fromnumeric.py:3596: RuntimeWarning: Mean of empty slice.
return _methods._mean(a, axis=axis, dtype=dtype,
/home/souryas2/miniconda3/envs/cellpose_env4/lib/python3.9/site-packages/numpy/_core/_methods.py:138: RuntimeWarning: invalid value encountered in scalar divide
ret = ret.dtype.type(ret / rcount)
100%|██████████| 614/614 [00:00<00:00, 1716.73it/s]
WARNING:cellpose.train:38 train images with number of masks less than min_train_masks (5), removing from train set
INFO:cellpose.train:>>> normalizing {'lowhigh': None, 'percentile': None, 'normalize': True, 'norm3D': True, 'sharpen_radius': 0, 'smooth_radius': 0, 'tile_norm_blocksize': 0, 'tile_norm_smooth3D': 1, 'invert': False}
INFO:cellpose.train:>>> n_epochs=25, n_train=576, n_test=None
INFO:cellpose.train:>>> AdamW, learning_rate=0.01000, weight_decay=0.00010
INFO:cellpose.train:>>> saving model to /home/souryas2/cellpose/cellpose_training_final/codes/models/cellpose_cpsam_test_folder_qpi_live_rgb_0520
INFO:cellpose.train:0, train_loss=1.3085, test_loss=0.0000, LR=0.000000, time 59.63s
INFO:cellpose.train:5, train_loss=2.4057, test_loss=0.0000, LR=0.005556, time 355.95s
INFO:cellpose.train:10, train_loss=2.4559, test_loss=0.0000, LR=0.010000, time 651.12s
INFO:cellpose.train:20, train_loss=2.4652, test_loss=0.0000, LR=0.010000, time 1240.88s
INFO:cellpose.train:saving network parameters to /home/souryas2/cellpose/cellpose_training_final/codes/models/cellpose_cpsam_test_folder_qpi_live_rgb_0520
`
```
Also this is my full training code
import os
import argparse
import random
import numpy as np
import tifffile
import matplotlib.pyplot as plt
from cellpose import models, io, train
from sklearn.metrics import f1_score
import logging
logging.basicConfig(level=logging.INFO)
# === Argparse ===
parser = argparse.ArgumentParser(description="Train Cellpose model on custom data")
parser.add_argument('--input_dir', type=str, required=True, help="Relative input training folder under base_dir")
parser.add_argument('--test_dir', type=str, required=True, help="Relative test folder under base_dir")
parser.add_argument('--gpu_id', type=str, default="0", help="CUDA_VISIBLE_DEVICES index to use (default: 0)")
args = parser.parse_args()
# === Base and full paths ===
base_dir = "/home/souryas2/cellpose/cellpose_training_final/all_data/"
train_dir = os.path.join(base_dir, args.input_dir)
test_dir = os.path.join(base_dir, args.test_dir)
# === Set GPU ===
os.environ["CUDA_VISIBLE_DEVICES"] = args.gpu_id
print(f"Using GPU ID: {args.gpu_id}")
# === Model Name and Path ===
model_name = f"cellpose_{args.input_dir}"
#model_path = f"/home/souryas2/cellpose/cellpose_training_final/models/{model_name}"
# === Your Dice score function ===
def dice_score(pred_mask, true_mask):
pred_mask_bin = (pred_mask > 0.5).astype(np.uint8).flatten()
true_mask_bin = (true_mask > 0).astype(np.uint8).flatten()
return f1_score(true_mask_bin, pred_mask_bin)
# === Training Parameters ===
n_epochs = 25
channel_to_use = 0
second_channel = 0
batch_size = 8
initial_learning_rate = 0.01 # using Adam optimizer
weight_decay = 1e-4
# === Data Augmentation Flag ===
use_data_augmentation = True
if use_data_augmentation:
print("✅ Data augmentation is enabled")
# === Load training/testing data ===
output = io.load_train_test_data(
train_dir, test_dir, image_filter="_img", mask_filter="_masks", look_one_level_down=False
)
images, labels, _, test_images, test_labels, _ = output
# === Subsample 50 random test examples for validation ===
random.seed(42)
val_indices = random.sample(range(len(test_images)), min(50, len(test_images)))
val_images = [test_images[i] for i in val_indices]
val_labels = [test_labels[i] for i in val_indices]
model = models.CellposeModel(model_type="cpsam", gpu=True)
model_name = f"cellpose_cpsam_{args.input_dir}" # as before
# === Train and save every epoch ===
model_path, train_losses, test_losses = train.train_seg(
model.net,
train_data=images,
train_labels=labels,
normalize=True,
weight_decay=1e-4,
SGD=False,
learning_rate=initial_learning_rate,
n_epochs=n_epochs,
model_name=model_name
)
Your script generally looks okay, but your training losses don't look correct. Loss should decrease, while yours increase. This suggests that something is wrong with your training data.
Some questions to help troubleshoot:
- Is this the exact same dataset you had success with when using CP3?
- How many images do you have in your dataset?
- How many objects are in the images? (If they're blank or sparsely populated there's little to train on)
- Yes exactly same. The images were originally grayscale. I also tried converting those to 3 channels.
- there are total 65k
- These are diverse datasets, some have 20, some have 2 or 3.
I'd try to (1) use the default LR and (2) default weight decay, and (3) train for more epochs (100+). Keep an eye on the loss when you experiment with this; if it's not going down, the model isn't learning.
closing due to inactivity