scanpy
scanpy copied to clipboard
Take n_bins into account for the cell_ranger flavor of highly_variable_genes
I've changed one line in the highly_variable_genes function, so that n_bins is taken into account with the cell_ranger flavor (currently only the seurat flavor uses this parameter).
Additionally, I have noticed that, in the current version, the bins are slightly offset: after -INF, it starts at 10, instead of 5, which results in the first bin containing twice as many genes as the other bins. I don't know if this is intentional (for example, to exactly reproduce the results of cell ranger) or not. In the version that I suggest, I have removed this offset. As a consequence, with the default parameter of n_bins=20, my new version does not exactly reproduce the results of the previous version. In order to exactly reproduce the current results, we would have to keep this offset by doing range(2,n_bins+1) instead of range(1,n_bins).
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Comparison is base (
86b85ee
) 72.72% compared to head (f233d75
) 72.72%.
Additional details and impacted files
@@ Coverage Diff @@
## master #624 +/- ##
=======================================
Coverage 72.72% 72.72%
=======================================
Files 111 111
Lines 12384 12384
=======================================
Hits 9006 9006
Misses 3378 3378
Files | Coverage Δ | |
---|---|---|
scanpy/preprocessing/_highly_variable_genes.py | 96.17% <ø> (ø) |