cytokit icon indicating copy to clipboard operation
cytokit copied to clipboard

Segmentation Methodology

Open eric-czech opened this issue 6 years ago • 1 comments

Relevant section from the CODEX preprint:

A 3D segmentation algorithm was therefore created to combine information from the nuclear staining and a ubiquitous membrane marker (in this case CD45) to define single-cell boundaries in crowded images such as lymphoid tissues. For each segmented object (i.e., cell) a marker expression profile, as well as the identities of the nearby neighbors were recorded (using Delaunay triangulation)

Software

Expanding on that list a bit:

Models Specific to Medical Imaging

  • U-Net (Example TF-based implementation) - This appears to be a real workhorse architecture in medical image segmentation (there are dozens of implementations in TensorFlow and Caffe)
  • V-Net - A TensorFlow implementation of 3d extensions to the U-Net
  • NiftyNet (Site) - "NiftyNet is a TensorFlow-based open-source convolutional neural networks (CNN) platform for research in medical image analysis and image-guided therapy."
    • If we have to retrain an architecture for segmentation I have to imagine this would be a top choice.
    • Supports 2-D, 2.5-D, 3-D, 4-D inputs
    • It has a Model Zoo but nothing in there for our modality yet, or anything even close
    • (Original Publication](https://arxiv.org/abs/1709.03485)

Generic Architectures

  • DeepLab (Google Research Post) - Google research project in the vein of Detectron
    • My gut says we'd never have enough data to train these big general kinds of models but who knows
  • SegNet - Another generic architecture for semantic segmentation which I only mention because it was brought up along with U-Nets in this webinar on advances in medical image analysis

Comments from @nsamusik on some things to keep in mind:

My main thought at this point is that the segmentation itself is just the first step, there also has to be a second step, where cell boundaries are optimized concomitantly with estimating the single-cell expression vectors. This way both the optimized cell boundaries and the expression data will likely look more accurate.

As for the benchmarking, I am happy to share a hand-labelled dataset that I have generated for the CODEX paper revisions. Here, each TIFF is matched with a TXT file that contains the coordinates of hand-labeled cell centers (X, Y, Z). There are no cell outlines labelled here, just the centers. In order to assess the segmentation quality, I computed several measures: R = Recall (% of hand-labelled centeres that ended up within a segmented cell region), S= Singlets (of those, what % how many ended up in a cell region with exactly 1 hand-labelled center), FPR = False positive rate (% cell regions without a hand-labelled center). Then I combined the three in a harmonic mean 3/(1/R + 1/S + 1/(1-FPR))

here's the link https://drive.google.com/open?id=1wUNaZ5dv2mDn_wwcSXlnfof6SwoQmlsq

eric-czech avatar May 01 '18 15:05 eric-czech

Recommendations from Allen Goodman (works on CellProfiler):

  • https://github.com/broadinstitute/keras-rcnn
  • https://github.com/raghakot/keras-resnet

Datasets for benchmarking:

  • http://cocodataset.org/#home
  • https://data.broadinstitute.org/bbbc/BBBC038/

Notes from the CellProfiler team on other methods they've considered

  • morphological (watershed)
  • k-means
  • diffusion
  • geometric (Active Contour)

They also mentioned that simulation tools like cytopaq (used to generate some datasets in Broad Biomage Benchmark datasets were not useful.

eric-czech avatar May 02 '18 22:05 eric-czech