An implementation of highly deviant gene identification from the 2019 GLMPCA paper. I'm rather fond of the method, as it's a straightforward statistical measure, and comes with significance testing as a form of data-driven cutoff.

I put it in a new highly_deviant_genes() function, as:

it comes with a number of unique parameters, and there's only so many different algorithms highly_variable_genes() can house
the paper argues that highly deviant is different from highly variable

I acknowledge that there are no tests, I'm hoping to get some assistance with that if possible.

Mar 26 '21 10:03 ktpolanski

Codecov Report

Merging #1765 (3569f57) into master (560bd5d) will decrease coverage by 0.38%. The diff coverage is 19.27%.

@@            Coverage Diff             @@
##           master    #1765      +/-   ##
==========================================
- Coverage   71.18%   70.80%   -0.39%     
==========================================
  Files          92       93       +1     
  Lines       11190    11273      +83     
==========================================
+ Hits         7966     7982      +16     
- Misses       3224     3291      +67

Impacted Files	Coverage Δ
scanpy/preprocessing/_highly_deviant_genes.py	`18.29% <18.29%> (ø)`
scanpy/preprocessing/__init__.py	`100.00% <100.00%> (ø)`

Mar 26 '21 10:03 codecov[bot]

I like that this method is fairly simple, and could have a meaningful cutoff, but I think I'd like more evidence of it's usefulness before thinking about including it.

I have two main points of concern:

Are there examples of this method being used outside of the glmPCA paper? I would at least like to know that reasonable results can be found downstream of this.
In the glmPCA paper, the identified genes are highly correlated (~1) with highly expressed genes, and lowly correlated (~.3 with highly variable gene selection. While I'm not sure which highly variable gene method they compared against, should the low correlation with common practice give us pause?

@giovp

Mar 30 '21 03:03 ivirshup

scanpy
scanpy copied to clipboard

highly deviant genes implementation

Codecov Report

scanpy scanpy copied to clipboard

highly deviant genes implementation

Codecov Report

scanpy
scanpy copied to clipboard