cellpose icon indicating copy to clipboard operation
cellpose copied to clipboard

Blank output after fine-tuning CPSAM model

Open souryasengupta opened this issue 7 months ago • 7 comments

I am fine-tuning Cellpose-SAM (cpsam) with my data. But the output seems to be completely blank. The same data, after fine-tuning Cyto3 or Cyto2 models, worked pretty well.

The code snippets below

model = models.CellposeModel(model_type="cpsam", gpu=True)
train.train_seg(
    model.net,
    train_data=images,
    train_labels=labels,
    normalize=True,
    weight_decay=1e-4,
    SGD=False,
    learning_rate=initial_learning_rate,
    n_epochs=n_epochs,
    model_name=model_name
)

souryasengupta avatar Jun 04 '25 18:06 souryasengupta

I have the same question

fanweiya avatar Jun 05 '25 07:06 fanweiya

Hi @souryasengupta I'm going to need more detail to understand the issue. Please share a complete, minimal code example. Also share the logs of your losses.

mrariden avatar Jun 05 '25 14:06 mrariden

This is my complete log. I omitted the training progress tqdm.

nohup: ignoring input
INFO:cellpose.io:not all flows are present, running flow generation for all images
INFO:cellpose.io:614 / 614 images in /home/souryas2/cellpose/cellpose_training_final/all_data/test_folder_qpi_live_rgb_0520 folder have labels
INFO:cellpose.io:not all flows are present, running flow generation for all images
INFO:cellpose.io:614 / 614 images in /home/souryas2/cellpose/cellpose_training_final/all_data/test_folder_qpi_live_rgb_0520 folder have labels
WARNING:cellpose.models:model_type argument is not used in v4.0.1+. Ignoring this argument...
INFO:cellpose.core:** TORCH CUDA version installed and working. **
INFO:cellpose.core:>>>> using GPU (CUDA)
INFO:cellpose.models:>>>> loading model /home/souryas2/.cellpose/models/cpsam
INFO:cellpose.dynamics:computing flows for labels


Welcome to CellposeSAM, cellpose v
cellpose version: 	4.0.3 
platform:       	linux 
python version: 	3.9.21 
torch version:  	2.6.0+cu124! The neural network component of
CPSAM is much larger than in previous versions and CPU excution is slow. 
We encourage users to use GPU/MPS if available. 


Using GPU ID: 0
✅ Data augmentation is enabled

  
INFO:cellpose.train:>>> computing diameters

1695.00it/s]/home/souryas2/miniconda3/envs/cellpose_env4/lib/python3.9/site-packages/numpy/_core/fromnumeric.py:3596: RuntimeWarning: Mean of empty slice.
  return _methods._mean(a, axis=axis, dtype=dtype,
/home/souryas2/miniconda3/envs/cellpose_env4/lib/python3.9/site-packages/numpy/_core/_methods.py:138: RuntimeWarning: invalid value encountered in scalar divide
  ret = ret.dtype.type(ret / rcount)

100%|██████████| 614/614 [00:00<00:00, 1716.73it/s]
WARNING:cellpose.train:38 train images with number of masks less than min_train_masks (5), removing from train set
INFO:cellpose.train:>>> normalizing {'lowhigh': None, 'percentile': None, 'normalize': True, 'norm3D': True, 'sharpen_radius': 0, 'smooth_radius': 0, 'tile_norm_blocksize': 0, 'tile_norm_smooth3D': 1, 'invert': False}
INFO:cellpose.train:>>> n_epochs=25, n_train=576, n_test=None
INFO:cellpose.train:>>> AdamW, learning_rate=0.01000, weight_decay=0.00010
INFO:cellpose.train:>>> saving model to /home/souryas2/cellpose/cellpose_training_final/codes/models/cellpose_cpsam_test_folder_qpi_live_rgb_0520
INFO:cellpose.train:0, train_loss=1.3085, test_loss=0.0000, LR=0.000000, time 59.63s
INFO:cellpose.train:5, train_loss=2.4057, test_loss=0.0000, LR=0.005556, time 355.95s
INFO:cellpose.train:10, train_loss=2.4559, test_loss=0.0000, LR=0.010000, time 651.12s
INFO:cellpose.train:20, train_loss=2.4652, test_loss=0.0000, LR=0.010000, time 1240.88s
INFO:cellpose.train:saving network parameters to /home/souryas2/cellpose/cellpose_training_final/codes/models/cellpose_cpsam_test_folder_qpi_live_rgb_0520
`
```

souryasengupta avatar Jun 05 '25 21:06 souryasengupta

Also this is my full training code

import os
import argparse
import random
import numpy as np
import tifffile
import matplotlib.pyplot as plt
from cellpose import models, io, train
from sklearn.metrics import f1_score
import logging
logging.basicConfig(level=logging.INFO)

# === Argparse ===
parser = argparse.ArgumentParser(description="Train Cellpose model on custom data")
parser.add_argument('--input_dir', type=str, required=True, help="Relative input training folder under base_dir")
parser.add_argument('--test_dir', type=str, required=True, help="Relative test folder under base_dir")
parser.add_argument('--gpu_id', type=str, default="0", help="CUDA_VISIBLE_DEVICES index to use (default: 0)")
args = parser.parse_args()

# === Base and full paths ===
base_dir = "/home/souryas2/cellpose/cellpose_training_final/all_data/"
train_dir = os.path.join(base_dir, args.input_dir)
test_dir = os.path.join(base_dir, args.test_dir)

# === Set GPU ===
os.environ["CUDA_VISIBLE_DEVICES"] = args.gpu_id
print(f"Using GPU ID: {args.gpu_id}")

# === Model Name and Path ===
model_name = f"cellpose_{args.input_dir}"
#model_path = f"/home/souryas2/cellpose/cellpose_training_final/models/{model_name}"

# === Your Dice score function ===
def dice_score(pred_mask, true_mask):
    pred_mask_bin = (pred_mask > 0.5).astype(np.uint8).flatten()
    true_mask_bin = (true_mask > 0).astype(np.uint8).flatten()
    return f1_score(true_mask_bin, pred_mask_bin)

# === Training Parameters ===
n_epochs = 25
channel_to_use = 0
second_channel = 0
batch_size = 8
initial_learning_rate = 0.01  # using Adam optimizer
weight_decay = 1e-4

# === Data Augmentation Flag ===
use_data_augmentation = True
if use_data_augmentation:
    print("✅ Data augmentation is enabled")

# === Load training/testing data ===
output = io.load_train_test_data(
    train_dir, test_dir, image_filter="_img", mask_filter="_masks", look_one_level_down=False
)
images, labels, _, test_images, test_labels, _ = output

# === Subsample 50 random test examples for validation ===
random.seed(42)
val_indices = random.sample(range(len(test_images)), min(50, len(test_images)))
val_images = [test_images[i] for i in val_indices]
val_labels = [test_labels[i] for i in val_indices]

model = models.CellposeModel(model_type="cpsam", gpu=True)

model_name = f"cellpose_cpsam_{args.input_dir}"  # as before

# === Train and save every epoch ===
model_path, train_losses, test_losses  = train.train_seg(
    model.net,
    train_data=images,
    train_labels=labels,
    normalize=True,
    weight_decay=1e-4,
    SGD=False,
    learning_rate=initial_learning_rate,
    n_epochs=n_epochs,
    model_name=model_name
)


souryasengupta avatar Jun 05 '25 21:06 souryasengupta

Your script generally looks okay, but your training losses don't look correct. Loss should decrease, while yours increase. This suggests that something is wrong with your training data.

Some questions to help troubleshoot:

  1. Is this the exact same dataset you had success with when using CP3?
  2. How many images do you have in your dataset?
  3. How many objects are in the images? (If they're blank or sparsely populated there's little to train on)

mrariden avatar Jun 06 '25 13:06 mrariden

  1. Yes exactly same. The images were originally grayscale. I also tried converting those to 3 channels.
  2. there are total 65k
  3. These are diverse datasets, some have 20, some have 2 or 3.

souryasengupta avatar Jun 06 '25 17:06 souryasengupta

I'd try to (1) use the default LR and (2) default weight decay, and (3) train for more epochs (100+). Keep an eye on the loss when you experiment with this; if it's not going down, the model isn't learning.

mrariden avatar Jun 06 '25 18:06 mrariden

closing due to inactivity

mrariden avatar Jul 29 '25 17:07 mrariden