scanpy icon indicating copy to clipboard operation
scanpy copied to clipboard

Take n_bins into account for the cell_ranger flavor of highly_variable_genes

Open e-sollier opened this issue 5 years ago • 1 comments

I've changed one line in the highly_variable_genes function, so that n_bins is taken into account with the cell_ranger flavor (currently only the seurat flavor uses this parameter).

Additionally, I have noticed that, in the current version, the bins are slightly offset: after -INF, it starts at 10, instead of 5, which results in the first bin containing twice as many genes as the other bins. I don't know if this is intentional (for example, to exactly reproduce the results of cell ranger) or not. In the version that I suggest, I have removed this offset. As a consequence, with the default parameter of n_bins=20, my new version does not exactly reproduce the results of the previous version. In order to exactly reproduce the current results, we would have to keep this offset by doing range(2,n_bins+1) instead of range(1,n_bins).

e-sollier avatar Apr 26 '19 15:04 e-sollier

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (86b85ee) 72.72% compared to head (f233d75) 72.72%.

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #624   +/-   ##
=======================================
  Coverage   72.72%   72.72%           
=======================================
  Files         111      111           
  Lines       12384    12384           
=======================================
  Hits         9006     9006           
  Misses       3378     3378           
Files Coverage Δ
scanpy/preprocessing/_highly_variable_genes.py 96.17% <ø> (ø)

codecov[bot] avatar Jan 19 '24 12:01 codecov[bot]