PharmacoGx icon indicating copy to clipboard operation
PharmacoGx copied to clipboard

Mutation data all "1" for gCSI_2017 using either summary.stat

Open khughitt opened this issue 5 years ago • 5 comments

Greetings!

In going through the gCSI_2017 dataset, I noticed that the mutation data appears to have either been incorrectly parsed, or is otherwise not very informative: all non-missing values returned by a called to summarizeMolecularProfiles have the same value, "1".

To Reproduce:

library(PharmacoGx)
library(SummarizedExperiment)

pset <- downloadPSet('gCSI_2017', saveDir = '/tmp')

# summary.stat = 'or'
se <- summarizeMolecularProfiles(pset, mDataType = 'mutation', summary.stat = 'or')
dat <- assay(se, 1)

#table(dat == 1)
# 
#  TRUE 
# 13480 
#

# summary.stat = 'and'
se <- summarizeMolecularProfiles(pset, mDataType = 'mutation', summary.stat = 'and')
dat <- assay(se, 1)

System information:

R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Arch Linux

Matrix products: default
BLAS:   /usr/lib/libopenblasp-r0.3.10.so
LAPACK: /usr/lib/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
 [1] SummarizedExperiment_1.19.6 DelayedArray_0.15.7         matrixStats_0.56.0          Matrix_1.2-18
 [5] Biobase_2.49.1              GenomicRanges_1.41.6        GenomeInfoDb_1.25.11        IRanges_2.23.10
 [9] S4Vectors_0.27.12           BiocGenerics_0.35.4         PharmacoGx_2.1.10           CoreGx_1.1.4
[13] nvimcom_0.9-102

loaded via a namespace (and not attached):
 [1] lsa_0.73.2             bitops_1.0-6           RColorBrewer_1.1-2     SnowballC_0.7.0        repr_1.1.0
 [6] tools_4.0.2            R6_2.4.1               DT_0.15                KernSmooth_2.23-17     sm_2.2-5.6
[11] colorspace_1.4-1       tidyselect_1.1.0       gridExtra_2.3          curl_4.3               compiler_4.0.2
[16] shinyjs_2.0.0          slam_0.1-47            caTools_1.18.0         scales_1.1.1           relations_0.6-9
[21] stringr_1.4.0          digest_0.6.25          XVector_0.29.3         base64enc_0.1-3        pkgconfig_2.0.3
[26] htmltools_0.5.0        plotrix_3.7-8          fastmap_1.0.1          limma_3.45.14          maps_3.3.0
[31] htmlwidgets_1.5.1      rlang_0.4.7            shiny_1.5.0            visNetwork_2.0.9       generics_0.0.2
[36] jsonlite_1.7.1         txtplot_1.0-4          BiocParallel_1.23.2    gtools_3.8.2           dplyr_1.0.2
[41] RCurl_1.98-1.2         magrittr_1.5           GenomeInfoDbData_1.2.3 celestial_1.4.6        Rcpp_1.0.5
[46] munsell_0.5.0          lifecycle_0.2.0        stringi_1.5.3          piano_2.5.0            MASS_7.3-53
[51] RJSONIO_1.3-1.4        zlibbioc_1.35.0        plyr_1.8.6             gplots_3.0.4           grid_4.0.2
[56] gdata_2.18.0           promises_1.1.1         shinydashboard_0.7.1   crayon_1.3.4           lattice_0.20-41
[61] mapproj_1.2.7          knitr_1.29             pillar_1.4.6           fgsea_1.15.2           tcltk_4.0.2
[66] igraph_1.2.5           reshape2_1.4.4         marray_1.67.0          fastmatch_1.1-0        NISTunits_1.0.1
[71] glue_1.4.2             downloader_0.4         data.table_1.13.0      BiocManager_1.30.10    vctrs_0.3.4
[76] httpuv_1.5.4           testthat_2.3.2         RANN_2.6.1             gtable_0.3.0           purrr_0.3.4
[81] ggplot2_3.3.2          xfun_0.17              mime_0.9               skimr_2.1.2            xtable_1.8-4
[86] pracma_2.2.9           later_1.1.0.1          tibble_3.0.3           sets_1.0-18            cluster_2.1.0
[91] ellipsis_0.3.1         magicaxis_2.0.10

khughitt avatar Oct 02 '20 20:10 khughitt

Hi @khughitt,

I will look into this and get back to you early next week.

Best, Chris

ChristopherEeles avatar Oct 02 '20 20:10 ChristopherEeles

Hi @khughitt,

I just ran through debugging for your code. Looks like the issue is with the gCSI_2017 PharmacoSet mutation data. I am reaching out to my colleagues now to look into resolving the issue.

I will keep you updated on our progress and share the correct data as soon as it is available.

Best, Chris

ChristopherEeles avatar Oct 06 '20 16:10 ChristopherEeles

Great! Thanks for taking the time to look into the issue and report it upstream!

khughitt avatar Oct 06 '20 19:10 khughitt

Hey @khughitt,

Just checking in so you know we didn't forget about you. The problem with the mutation data goes all the way upstream to Genentech. We are currently working with them to resolve the issue but it may take some time.

Best, Chris

ChristopherEeles avatar Oct 21 '20 16:10 ChristopherEeles

Hi @ChristopherEeles

No problem -- Thanks for taking the time to follow-up!

Cheers, Keith

khughitt avatar Oct 22 '20 14:10 khughitt