matrixStats icon indicating copy to clipboard operation
matrixStats copied to clipboard

added colscale/rowscale functions

Open muschellij2 opened this issue 8 years ago • 4 comments

row/col Scale functions implement a faster version of scaling than scale in R. They have slightly different behavior than scale when center = FALSE, which is in the notes.

muschellij2 avatar Feb 23 '16 20:02 muschellij2

Hey Henrik,

Do you think this would be useful?

muschellij2 avatar Apr 06 '16 19:04 muschellij2

Hi @muschellij2, thank you very much for proposing this and I'm sorry for the delay (been/am swamped). I saw it when you committed it and I also so your blog post.

I'm currently taking a rather conservative / slow-phase approach when it comes to adding new features to matrixStats. The main reason is that in order to keep the matrixStats API stable and consistent in the long run, so anything that goes in needs to have been considered for quite sometime. You can see that there are several feature requests already that's been in there for quite some time. Some of these are more straightforward than others and they're more likely to come in sooner.

Feedback on implementation

I think your proposal could be made more efficient if:

  • it avoid subset x <- x[rows,cols] but instead pass those as arguments to rowMeans() and rowSds() (but I'm not 100% sure so benchmarking may be needed, e.g. with and without centering may make a difference)
  • avoids allocating auxiliary vectors cm and csd unless really needed. This would lower the memory usage unnecessary calculations, particularly when x is large.

Feedback on how to contribute to speed things up

I should write some type of CONTRIBUTE file with some guidelines. For instance, it would help much (and save lots of time) if the same coding standard is used as elsewhere in the package, e.g. <- instead of =, not use return() unless needed, naming styles for arguments etc. Sounds picky, but it adds up in time when trying to merge/bring in. Also, before releasing new functionality to the matrixStats package, lots and lots of tests need to be written trying to cover missing values, no missing values, empty vectors/matrices, corner cases, integer and double data types etc. I'd say, when it comes to matrixStats, complete test coverage is critical in order for the package to be trusted.

I'm starting to sound like the reply you get when someone tries to propose improvements to core R ;) - but please don't read this as I don't appreciate suggestions/PRs - I do - it's just that I'm already scraping with the 24 hours/day we have to play with. But the more complete of a case you have the greater its chances are for bubbling up to the top of the set of features to be added next.

HenrikBengtsson avatar Apr 06 '16 20:04 HenrikBengtsson

Codecov Report

Merging #81 into master will decrease coverage by 0.91%. The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master     #81      +/-   ##
=========================================
- Coverage   89.61%   88.7%   -0.92%     
=========================================
  Files         109     110       +1     
  Lines        4139    4177      +38     
=========================================
- Hits         3709    3705       -4     
- Misses        430     472      +42
Impacted Files Coverage Δ
R/rowScale.R 0% <0%> (ø)
src/weightedMedian_lowlevel_template.h 93.38% <0%> (-3.31%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 06f1d9c...6bfabc5. Read the comment docs.

codecov-io avatar Dec 27 '18 19:12 codecov-io

Codecov Report

Merging #81 into master will decrease coverage by 0.81%. The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #81      +/-   ##
==========================================
- Coverage   89.61%   88.79%   -0.82%     
==========================================
  Files         109      110       +1     
  Lines        4139     4177      +38     
==========================================
  Hits         3709     3709              
- Misses        430      468      +38
Impacted Files Coverage Δ
R/rowScale.R 0% <0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 06f1d9c...6bfabc5. Read the comment docs.

codecov-io avatar Dec 27 '18 19:12 codecov-io