Mutation data all "1" for gCSI_2017 using either summary.stat
Greetings!
In going through the gCSI_2017 dataset, I noticed that the mutation data appears to have either been incorrectly parsed, or is otherwise not very informative: all non-missing values returned by a called to summarizeMolecularProfiles have the same value, "1".
To Reproduce:
library(PharmacoGx)
library(SummarizedExperiment)
pset <- downloadPSet('gCSI_2017', saveDir = '/tmp')
# summary.stat = 'or'
se <- summarizeMolecularProfiles(pset, mDataType = 'mutation', summary.stat = 'or')
dat <- assay(se, 1)
#table(dat == 1)
#
# TRUE
# 13480
#
# summary.stat = 'and'
se <- summarizeMolecularProfiles(pset, mDataType = 'mutation', summary.stat = 'and')
dat <- assay(se, 1)
System information:
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Arch Linux
Matrix products: default
BLAS: /usr/lib/libopenblasp-r0.3.10.so
LAPACK: /usr/lib/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] SummarizedExperiment_1.19.6 DelayedArray_0.15.7 matrixStats_0.56.0 Matrix_1.2-18
[5] Biobase_2.49.1 GenomicRanges_1.41.6 GenomeInfoDb_1.25.11 IRanges_2.23.10
[9] S4Vectors_0.27.12 BiocGenerics_0.35.4 PharmacoGx_2.1.10 CoreGx_1.1.4
[13] nvimcom_0.9-102
loaded via a namespace (and not attached):
[1] lsa_0.73.2 bitops_1.0-6 RColorBrewer_1.1-2 SnowballC_0.7.0 repr_1.1.0
[6] tools_4.0.2 R6_2.4.1 DT_0.15 KernSmooth_2.23-17 sm_2.2-5.6
[11] colorspace_1.4-1 tidyselect_1.1.0 gridExtra_2.3 curl_4.3 compiler_4.0.2
[16] shinyjs_2.0.0 slam_0.1-47 caTools_1.18.0 scales_1.1.1 relations_0.6-9
[21] stringr_1.4.0 digest_0.6.25 XVector_0.29.3 base64enc_0.1-3 pkgconfig_2.0.3
[26] htmltools_0.5.0 plotrix_3.7-8 fastmap_1.0.1 limma_3.45.14 maps_3.3.0
[31] htmlwidgets_1.5.1 rlang_0.4.7 shiny_1.5.0 visNetwork_2.0.9 generics_0.0.2
[36] jsonlite_1.7.1 txtplot_1.0-4 BiocParallel_1.23.2 gtools_3.8.2 dplyr_1.0.2
[41] RCurl_1.98-1.2 magrittr_1.5 GenomeInfoDbData_1.2.3 celestial_1.4.6 Rcpp_1.0.5
[46] munsell_0.5.0 lifecycle_0.2.0 stringi_1.5.3 piano_2.5.0 MASS_7.3-53
[51] RJSONIO_1.3-1.4 zlibbioc_1.35.0 plyr_1.8.6 gplots_3.0.4 grid_4.0.2
[56] gdata_2.18.0 promises_1.1.1 shinydashboard_0.7.1 crayon_1.3.4 lattice_0.20-41
[61] mapproj_1.2.7 knitr_1.29 pillar_1.4.6 fgsea_1.15.2 tcltk_4.0.2
[66] igraph_1.2.5 reshape2_1.4.4 marray_1.67.0 fastmatch_1.1-0 NISTunits_1.0.1
[71] glue_1.4.2 downloader_0.4 data.table_1.13.0 BiocManager_1.30.10 vctrs_0.3.4
[76] httpuv_1.5.4 testthat_2.3.2 RANN_2.6.1 gtable_0.3.0 purrr_0.3.4
[81] ggplot2_3.3.2 xfun_0.17 mime_0.9 skimr_2.1.2 xtable_1.8-4
[86] pracma_2.2.9 later_1.1.0.1 tibble_3.0.3 sets_1.0-18 cluster_2.1.0
[91] ellipsis_0.3.1 magicaxis_2.0.10
Hi @khughitt,
I will look into this and get back to you early next week.
Best, Chris
Hi @khughitt,
I just ran through debugging for your code. Looks like the issue is with the gCSI_2017 PharmacoSet mutation data. I am reaching out to my colleagues now to look into resolving the issue.
I will keep you updated on our progress and share the correct data as soon as it is available.
Best, Chris
Great! Thanks for taking the time to look into the issue and report it upstream!
Hey @khughitt,
Just checking in so you know we didn't forget about you. The problem with the mutation data goes all the way upstream to Genentech. We are currently working with them to resolve the issue but it may take some time.
Best, Chris
Hi @ChristopherEeles
No problem -- Thanks for taking the time to follow-up!
Cheers, Keith