MungeSumstats icon indicating copy to clipboard operation
MungeSumstats copied to clipboard

`cannot find an open port. For manually specifying the port, see ?SnowParamUsing previously downloaded VCF.`

Open bschilder opened this issue 1 year ago • 1 comments

1. Bug description

import_sumstats: Works fine for 100s of GWAS, then encounters this error and quickly iterates through all remaining GWAS ids without actually processing them (and, strangely, appends their log files to that of the one that first encountered the error!).

This takes a very long time to actually reproduce (multiple days of running continuously). And it's not like the GWAS that was being analyzed at the time of the error was particular large or anything ("only" 11M SNPs).

Possible explanations

  1. Multiple users on our private cloud are accidentally trying to use the same threads at the same time, and BiocParallel can't handle this gracefully?
  2. The virtual machine becomes temporarily disconnected from its dedicated resources. Perhaps a question for @eduff
  3. data.table is trying to run in parallel within each loop of read_vcf_parallel (which is also being run in parallel), causing a conflict with the same cores being requested for different tasks at once. Though I don't know why this wouldn't happen far earlier when processing 100s of GWAS.

read_vcf_parallel:

It seems to occur at read_vcf_parallel. This function seems to be rather finicky as it also doesn't like it when I specify >30 threads, though I suspect that's for a different reason (splitting a VCF across too many threads means that if some genome tiles are empty, the whole loop breaks, perhaps at the final re-merging step).

Related Issues

BiocParallel:

  • https://github.com/Bioconductor/BiocParallel/pull/187
  • https://github.com/Bioconductor/BiocParallel/issues/85
  • https://github.com/Bioconductor/BiocParallel/issues/106

Also, not sure if I'm the only one, but BiocParallel can be a bit trickier to use successfully.

Console output

Using local VCF.
File already tabix-indexed.
Finding empty VCF columns based on first 10,000 rows.
Dropping 1 duplicate columns.
1 sample detected: ubm-a-129
Constructing ScanVcfParam object.
VCF contains: 11,734,353 variant(s) x 1 sample(s)
Reading VCF file: multi-threaded (30 threads)
failed to open the port 11221, trying a new port...
failed to open the port 11596, trying a new port...
failed to open the port 11982, trying a new port...
failed to open the port 11329, trying a new port...
failed to open the port 11700, trying a new port...
  cannot find an open port. For manually specifying the port, see ?SnowParamUsing previously downloaded VCF.
Formatted summary statistics will be saved to ==>  /shared/bms20/projects/MAGMA_Files_Public/data/GWAS_sumstats/ubm-a-81/ubm-a-81.tsv.gz
Log data to be saved to ==>  /shared/bms20/projects/MAGMA_Files_Public/data/GWAS_sumstats/ubm-a-81/logs
Saving output messages to:
/shared/bms20/projects/MAGMA_Files_Public/data/GWAS_sumstats/ubm-a-81/logs/MungeSumstats_log_msg.txt
Any runtime errors will be saved to:
/shared/bms20/projects/MAGMA_Files_Public/data/GWAS_sumstats/ubm-a-81/logs/MungeSumstats_log_output.txt
Messages will not be printed to terminal.
all connections are in useUsing previously downloaded VCF.
Formatted summary statistics will be saved to ==>  /shared/bms20/projects/MAGMA_Files_Public/data/GWAS_sumstats/ubm-a-93/ubm-a-93.tsv.gz
Log data to be saved to ==>  /shared/bms20/projects/MAGMA_Files_Public/data/GWAS_sumstats/ubm-a-93/logs
Saving output messages to:
/shared/bms20/projects/MAGMA_Files_Public/data/GWAS_sumstats/ubm-a-93/logs/MungeSumstats_log_msg.txt
Any runtime errors will be saved to:
/shared/bms20/projects/MAGMA_Files_Public/data/GWAS_sumstats/ubm-a-93/logs/MungeSumstats_log_output.txt
Messages will not be printed to terminal.
...
...
...

Full logs file: ubm-a-129_log_msg.txt

Expected behaviour

Process all sumstats.

2. Reproducible example

Code

meta <- MungeSumstats::find_sumstats(subcategories = c("neurological","Immune","cardio"))

gwas_paths <- MungeSumstats::import_sumstats(
  ids = meta$id[1:400], 
  save_dir = here::here("data/GWAS_sumstats"), 
  nThread = 30, # >30 causes issues with read_vcf_parallel
  parallel_across_ids = FALSE, 
  force_new_vcf = FALSE,
  force_new = FALSE,
  vcf_download = TRUE,
  vcf_dir = here::here("data/VCFs"),
  ### axel will keep trying forever if the URL doesn't exist (or is private)
  # download_method = "axel",
  #### Record logs
  log_folder_ind = TRUE,
  log_mungesumstats_msgs = TRUE,
  ) 

3. Session info

R Under development (unstable) (2022-02-25 r81808)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] GenomeInfoDb_1.33.3    IRanges_2.31.0         S4Vectors_0.35.1       BiocGenerics_0.43.1   
[5] dplyr_1.0.9            ggplot2_3.3.6          data.table_1.14.2      MungeSumstats_1.5.5   
[9] MAGMA.Celltyping_2.0.6

loaded via a namespace (and not attached):
  [1] utf8_1.2.2                                  R.utils_2.12.0                             
  [3] tidyselect_1.1.2                            lme4_1.1-30                                
  [5] RSQLite_2.2.15                              AnnotationDbi_1.59.1                       
  [7] htmlwidgets_1.5.4                           grid_4.2.0                                 
  [9] BiocParallel_1.31.10                        munsell_0.5.0                              
 [11] codetools_0.2-18                            withr_2.5.0                                
 [13] colorspace_2.0-3                            Biobase_2.57.1                             
 [15] filelock_1.0.2                              knitr_1.39                                 
 [17] rstudioapi_0.13                             orthogene_1.3.1                            
 [19] SingleCellExperiment_1.19.0                 ggsignif_0.6.3                             
 [21] MatrixGenerics_1.9.1                        GenomeInfoDbData_1.2.8                     
 [23] bit64_4.0.5                                 rprojroot_2.0.3                            
 [25] vctrs_0.4.1                                 treeio_1.21.0                              
 [27] generics_0.1.3                              xfun_0.31                                  
 [29] BiocFileCache_2.5.0                         R6_2.5.1                                   
 [31] bitops_1.0-7                                cachem_1.0.6                               
 [33] gridGraphics_0.5-1                          DelayedArray_0.23.1                        
 [35] assertthat_0.2.1                            BSgenome.Hsapiens.1000genomes.hs37d5_0.99.1
 [37] promises_1.2.0.1                            BiocIO_1.7.1                               
 [39] scales_1.2.0                                gtable_0.3.0                               
 [41] SNPlocs.Hsapiens.dbSNP155.GRCh37_0.99.22    SNPlocs.Hsapiens.dbSNP155.GRCh38_0.99.22   
 [43] rlang_1.0.4                                 splines_4.2.0                              
 [45] rtracklayer_1.57.0                          rstatix_0.7.0                              
 [47] lazyeval_0.2.2                              gargle_1.2.0                               
 [49] broom_1.0.0                                 BiocManager_1.30.18                        
 [51] yaml_2.3.5                                  reshape2_1.4.4                             
 [53] abind_1.4-5                                 GenomicFeatures_1.49.5                     
 [55] backports_1.4.1                             httpuv_1.6.5                               
 [57] tools_4.2.0                                 ggplotify_0.1.0                            
 [59] ellipsis_0.3.2                              ggdendro_0.1.23                            
 [61] Rcpp_1.0.9                                  plyr_1.8.7                                 
 [63] progress_1.2.2                              zlibbioc_1.43.0                            
 [65] purrr_0.3.4                                 RCurl_1.98-1.8                             
 [67] prettyunits_1.1.1                           ggpubr_0.4.0                               
 [69] GenomicFiles_1.33.1                         BSgenome.Hsapiens.NCBI.GRCh38_1.3.1000     
 [71] SummarizedExperiment_1.27.1                 fs_1.5.2                                   
 [73] here_1.0.1                                  magrittr_2.0.3                             
 [75] matrixStats_0.62.0                          hms_1.1.1                                  
 [77] patchwork_1.1.1                             mime_0.12                                  
 [79] evaluate_0.15                               xtable_1.8-4                               
 [81] XML_3.99-0.10                               EWCE_1.5.5                                 
 [83] gridExtra_2.3                               compiler_4.2.0                             
 [85] biomaRt_2.53.2                              tibble_3.1.8                               
 [87] crayon_1.5.1                                minqa_1.2.4                                
 [89] R.oo_1.25.0                                 htmltools_0.5.3                            
 [91] ggfun_0.0.6                                 later_1.3.0                                
 [93] tidyr_1.2.0                                 aplot_0.1.6                                
 [95] DBI_1.1.3                                   ExperimentHub_2.5.0                        
 [97] gprofiler2_0.2.1                            dbplyr_2.2.1                               
 [99] MASS_7.3-58                                 rappdirs_0.3.3                             
[101] boot_1.3-28                                 babelgene_22.3                             
[103] Matrix_1.4-1                                car_3.1-0                                  
[105] cli_3.3.0                                   R.methodsS3_1.8.2                          
[107] parallel_4.2.0                              SNPlocs.Hsapiens.dbSNP144.GRCh37_0.99.20   
[109] GenomicRanges_1.49.0                        pkgconfig_2.0.3                            
[111] SNPlocs.Hsapiens.dbSNP144.GRCh38_0.99.20    GenomicAlignments_1.33.1                   
[113] plotly_4.10.0                               xml2_1.3.3                                 
[115] ggtree_3.5.1                                XVector_0.37.0                             
[117] yulab.utils_0.0.5                           stringr_1.4.0                              
[119] VariantAnnotation_1.43.2                    digest_0.6.29                              
[121] Biostrings_2.65.1                           rmarkdown_2.14                             
[123] HGNChelper_0.8.1                            tidytree_0.3.9                             
[125] restfulr_0.0.15                             curl_4.3.2                                 
[127] shiny_1.7.2                                 Rsamtools_2.13.3                           
[129] rjson_0.2.21                                nloptr_2.0.3                               
[131] lifecycle_1.0.1                             nlme_3.1-158                               
[133] jsonlite_1.8.0                              carData_3.0-5                              
[135] viridisLite_0.4.0                           limma_3.53.5                               
[137] BSgenome_1.65.2                             fansi_1.0.3                                
[139] pillar_1.8.0                                lattice_0.20-45                            
[141] homologene_1.4.68.19.3.27                   KEGGREST_1.37.3                            
[143] fastmap_1.1.0                               httr_1.4.3                                 
[145] googleAuthR_2.0.0                           interactiveDisplayBase_1.35.0              
[147] glue_1.6.2                                  RNOmni_1.0.0                               
[149] png_0.1-7                                   ewceData_1.5.0                             
[151] BiocVersion_3.16.0                          bit_4.0.4                                  
[153] stringi_1.7.8                               blob_1.2.3                                 
[155] AnnotationHub_3.5.0                         memoise_2.0.1                              
[157] ape_5.6-2  

bschilder avatar Aug 06 '22 18:08 bschilder