sleuth icon indicating copy to clipboard operation
sleuth copied to clipboard

Improve obs_to_matrix with data.table

Open andrewrech opened this issue 6 years ago • 6 comments

Replace reshape2::dcast with data.table::dcast for speed gain and to avoid issue with long sample names causing vector return error in reshape2 but not data.table:

Using reshape2:

obs_counts <- dcast(obj$obs_norm, target_id ~ sample, value.var = value_name)
Aggregation function missing: defaulting to length

Error during wrapup: dims [product 868328] do not match the length of object [41674225]

But no error using data.table.

obj_norm to reproduce

Linux i-094bb338984186ac0 4.15.0-1031-aws #33-Ubuntu SMP Fri Dec 7 09:32:27 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

R version 3.5.2 (2018-12-20)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.1 LTS

Matrix products: default
BLAS/LAPACK: /opt/intel/compilers_and_libraries_2018.0.128/linux/mkl/lib/intel64_lin/libmkl_rt.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] sleuth_0.30.0     data.table_1.12.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.0            bindr_0.1.1           magrittr_1.5.0.9000
 [4] tidyselect_0.2.5.9000 munsell_0.5.0.9000    colorspace_1.4-0
 [7] R6_2.3.0              rlang_0.3.1.9000      dplyr_0.7.8
[10] tools_3.5.2           parallel_3.5.2        grid_3.5.2
[13] rhdf5_2.24.0          gtable_0.2.0          lazyeval_0.2.1.9000
[16] assertthat_0.2.0.9000 tibble_2.0.1          crayon_1.3.4
[19] bindrcpp_0.2.2        purrr_0.3.0           Rhdf5lib_1.2.1
[22] ggplot2_3.1.0.9000    glue_1.3.0.9000       compiler_3.5.2
[25] pillar_1.3.1          scales_1.0.0.9000     pkgconfig_2.0.2

andrewrech avatar Feb 04 '19 23:02 andrewrech

Additionally fixed use of data.table::dcast on data.frame (this function does not implicitly convert)

andrewrech avatar Feb 05 '19 00:02 andrewrech

Hi @andrewrech,

Thank you for your work here! Two suggestions:

  1. A cleaner way to handle issue #135 is to add an #' @importFrom utils head statement to the documentation for sleuth_gene_table (between lines 1092 and 1093 of sleuth.R here), as well as sleuth_to_matrix(between lines 32 and 33 of matrix.R here). Once you've made the change, open an R session in the sleuth directory and use devtools::document() to update the NAMESPACE file.
  2. There are three calls to reshape2 in the plot_transcript_heatmap function in plots.R. If you convert those to data.table::dcast and then add a line after the if statements to convert tab_df to a data.frame, we can remove a dependency to reshape2 altogether (just deleting it from the "depends" list in the DESCRIPTION file). You should also run devtools::document() to make sure references to reshape2 are gone from the NAMESPACE file.

In both cases, you can run devtools::check() to see if there are any other issues.

warrenmcg avatar Mar 14 '19 14:03 warrenmcg

Also, I've switched to the devel branch for merging, as it allows us to combine this with other changes for the next stable release of sleuth.

warrenmcg avatar Mar 14 '19 14:03 warrenmcg

@warrenmcg Thank you for your help

  • [x] added importFrom
  • [x] update NAMESPACE
  • [x] check that no further namespace issues exist with devtools::check
  • [x] convert plots.R reshape2 calls
  • [x] remove reshape2 import from DESCRIPTION

then add a line after the if statements to convert tab_df to a data.frame

data.table::dcast returns a data frame if given a data frame; no need to convert back

data.table::dcast(test_df, time ~ variable, fun=mean) %>% str
'data.frame':   12 obs. of  2 variables:
 $ time  : num  0 2 4 6 8 10 12 14 16 18 ...
 $ weight: num  41.1 49.2 60 74.3 91.2 ...
  • [x] confirm reshape2 is not required using devtools::check

andrewrech avatar Mar 16 '19 15:03 andrewrech

@warrenmcg see my note above

andrewrech avatar Mar 16 '19 16:03 andrewrech

Hi @andrewrech, you are correct. Sorry about that! We will be reviewing everything next week to merge and hopefully have a new release soon! Thanks again for your help and effort! 👍

warrenmcg avatar Mar 16 '19 18:03 warrenmcg