matrixStats
matrixStats copied to clipboard
added colscale/rowscale functions
row/col Scale functions implement a faster version of scaling than scale
in R. They have slightly different behavior than scale when center = FALSE
, which is in the notes.
Hey Henrik,
Do you think this would be useful?
Hi @muschellij2, thank you very much for proposing this and I'm sorry for the delay (been/am swamped). I saw it when you committed it and I also so your blog post.
I'm currently taking a rather conservative / slow-phase approach when it comes to adding new features to matrixStats. The main reason is that in order to keep the matrixStats API stable and consistent in the long run, so anything that goes in needs to have been considered for quite sometime. You can see that there are several feature requests already that's been in there for quite some time. Some of these are more straightforward than others and they're more likely to come in sooner.
Feedback on implementation
I think your proposal could be made more efficient if:
- it avoid subset
x <- x[rows,cols]
but instead pass those as arguments torowMeans()
androwSds()
(but I'm not 100% sure so benchmarking may be needed, e.g. with and without centering may make a difference) - avoids allocating auxiliary vectors
cm
andcsd
unless really needed. This would lower the memory usage unnecessary calculations, particularly whenx
is large.
Feedback on how to contribute to speed things up
I should write some type of CONTRIBUTE
file with some guidelines. For instance, it would help much (and save lots of time) if the same coding standard is used as elsewhere in the package, e.g. <-
instead of =
, not use return()
unless needed, naming styles for arguments etc. Sounds picky, but it adds up in time when trying to merge/bring in. Also, before releasing new functionality to the matrixStats package, lots and lots of tests need to be written trying to cover missing values, no missing values, empty vectors/matrices, corner cases, integer and double data types etc. I'd say, when it comes to matrixStats, complete test coverage is critical in order for the package to be trusted.
I'm starting to sound like the reply you get when someone tries to propose improvements to core R ;) - but please don't read this as I don't appreciate suggestions/PRs - I do - it's just that I'm already scraping with the 24 hours/day we have to play with. But the more complete of a case you have the greater its chances are for bubbling up to the top of the set of features to be added next.
Codecov Report
Merging #81 into master will decrease coverage by
0.91%
. The diff coverage is0%
.
@@ Coverage Diff @@
## master #81 +/- ##
=========================================
- Coverage 89.61% 88.7% -0.92%
=========================================
Files 109 110 +1
Lines 4139 4177 +38
=========================================
- Hits 3709 3705 -4
- Misses 430 472 +42
Impacted Files | Coverage Δ | |
---|---|---|
R/rowScale.R | 0% <0%> (ø) |
|
src/weightedMedian_lowlevel_template.h | 93.38% <0%> (-3.31%) |
:arrow_down: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 06f1d9c...6bfabc5. Read the comment docs.
Codecov Report
Merging #81 into master will decrease coverage by
0.81%
. The diff coverage is0%
.
@@ Coverage Diff @@
## master #81 +/- ##
==========================================
- Coverage 89.61% 88.79% -0.82%
==========================================
Files 109 110 +1
Lines 4139 4177 +38
==========================================
Hits 3709 3709
- Misses 430 468 +38
Impacted Files | Coverage Δ | |
---|---|---|
R/rowScale.R | 0% <0%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 06f1d9c...6bfabc5. Read the comment docs.