cytokit
cytokit copied to clipboard
Segmentation Methodology
Relevant section from the CODEX preprint:
A 3D segmentation algorithm was therefore created to combine information from the nuclear staining and a ubiquitous membrane marker (in this case CD45) to define single-cell boundaries in crowded images such as lymphoid tissues. For each segmented object (i.e., cell) a marker expression profile, as well as the identities of the nearby neighbors were recorded (using Delaunay triangulation)
Software
- CellProfiler: uses ilastik for segmentation, cf. blog post
- scikit-image: paper, blog post
- CellSegm: paper, code
- DALMATIAN: paper, code
- TissueMiner: paper, code
- FogBank: paper, code
- BioVoxxel toolbox: paper, code
- Detectron
Expanding on that list a bit:
- CellProfiler:
- There is another blog post that mentions volumetric segmentation (as opposed to ilastik which afaik is only 2D). It's a little unclear exactly what capabilities they are referring to within CellProfiler but it may simply be this 3D watershed implementation: watershed.py. I don't see any modules in CellProfiler for segmentation that are also designed to work in 3D.
- This post also mentions the Allen Cell Explorer which seems like a great way to curate 3D volumes
- In the forum post associated with that blog post above, they also mention the Google Accelerated Sciences plugin for Scoring Image Focus and a pre-trained deep learning model for cell segmentation based on the paper "Automated Training of Deep Convolutional Neural Networks for Cell Segmentation"
- DeepCell (site) (repo)
-
DeepFlow (original version):
- Reconstructs cell cycle for T-cells and identifies 7 different phases as well as identifies dead vs alive cells
- mxnet implementation on github (from scanpy creators)
- This is a bit off topic for segmentation but the T-cell imaging data used for it could be useful
Models Specific to Medical Imaging
- U-Net (Example TF-based implementation) - This appears to be a real workhorse architecture in medical image segmentation (there are dozens of implementations in TensorFlow and Caffe)
- V-Net - A TensorFlow implementation of 3d extensions to the U-Net
-
NiftyNet (Site) - "NiftyNet is a TensorFlow-based open-source convolutional neural networks (CNN) platform for research in medical image analysis and image-guided therapy."
- If we have to retrain an architecture for segmentation I have to imagine this would be a top choice.
- Supports 2-D, 2.5-D, 3-D, 4-D inputs
- It has a Model Zoo but nothing in there for our modality yet, or anything even close
- (Original Publication](https://arxiv.org/abs/1709.03485)
Generic Architectures
-
DeepLab (Google Research Post) - Google research project in the vein of Detectron
- My gut says we'd never have enough data to train these big general kinds of models but who knows
- SegNet - Another generic architecture for semantic segmentation which I only mention because it was brought up along with U-Nets in this webinar on advances in medical image analysis
Comments from @nsamusik on some things to keep in mind:
My main thought at this point is that the segmentation itself is just the first step, there also has to be a second step, where cell boundaries are optimized concomitantly with estimating the single-cell expression vectors. This way both the optimized cell boundaries and the expression data will likely look more accurate.
As for the benchmarking, I am happy to share a hand-labelled dataset that I have generated for the CODEX paper revisions. Here, each TIFF is matched with a TXT file that contains the coordinates of hand-labeled cell centers (X, Y, Z). There are no cell outlines labelled here, just the centers. In order to assess the segmentation quality, I computed several measures: R = Recall (% of hand-labelled centeres that ended up within a segmented cell region), S= Singlets (of those, what % how many ended up in a cell region with exactly 1 hand-labelled center), FPR = False positive rate (% cell regions without a hand-labelled center). Then I combined the three in a harmonic mean 3/(1/R + 1/S + 1/(1-FPR))
here's the link https://drive.google.com/open?id=1wUNaZ5dv2mDn_wwcSXlnfof6SwoQmlsq
Recommendations from Allen Goodman (works on CellProfiler):
- https://github.com/broadinstitute/keras-rcnn
- https://github.com/raghakot/keras-resnet
Datasets for benchmarking:
- http://cocodataset.org/#home
- https://data.broadinstitute.org/bbbc/BBBC038/
Notes from the CellProfiler team on other methods they've considered
- morphological (watershed)
- k-means
- diffusion
- geometric (Active Contour)
They also mentioned that simulation tools like cytopaq (used to generate some datasets in Broad Biomage Benchmark datasets were not useful.