Add multiple BAM files with modification tags
Hi,
I am a newbie. Could you please help me ? Many thanks!
Describe the bug I do not know how to work with multiple modbam files (generated with dorado)
To Reproduce
exon_tibble <- get_exons_hg19()
head(exon_tibble)
# A tibble: 6 × 7
gene_id chr strand start end transcript_id symbol
<chr> <chr> <chr> <int> <int> <int> <chr>
1 1 chr19 - 58858172 58858395 70455 A1BG
2 1 chr19 - 58858719 58859006 70455 A1BG
3 1 chr19 - 58861736 58862017 70455 A1BG
4 1 chr19 - 58862757 58863053 70455 A1BG
5 1 chr19 - 58863649 58863921 70455 A1BG
6 1 chr19 - 58864294 58864563 70455 A1BG
infile <- stringr::str_sort(list.files("/Volumes/Janet/bam_merged"), num = TRUE)
length(infile)
[1] 16
infile
[1] "SL7-1.bam"
[2] "SL7-2.bam"
[3] "SL19-1.bam"
[4] "SL19-2.bam"
[5] "SL34-1.bam"
[6] "SL34-2.bam"
[7] "SL44-1.bam"
[8] "SL44-2.bam"
[9] "SL45-1.bam"
[10] "SL45-2.bam"
[11] "SL46-1.bam"
[12] "SL46-2.bam"
[13] "SL54-1.bam"
[14] "SL54-2.bam"
[15] "SL56-1.bam"
[16] "SL56-2.bam"
sample <- c("7-1", "7-2","19-1", "19-2","34-1", "34-2","44-1", "44-2","45-1", "45-2","46-1", "46-2","54-1", "54-2","56-1", "56-2")
group <- c("before", "after", "before", "after","before", "after","before", "after","before", "after","before", "after","before", "after","before", "after")
sample_anno <- data.frame(sample, group, stringsAsFactors = FALSE)
sample_anno
sample group
1 7-1 before
2 7-2 after
3 19-1 before
4 19-2 after
5 34-1 before
6 34-2 after
7 44-1 before
8 44-2 after
9 45-1 before
10 45-2 after
11 46-1 before
12 46-2 after
13 54-1 before
14 54-2 after
15 56-1 before
16 56-2 after
mbr <- ModBamResult(methy = ModBamFiles(samples = infile, system.file(package = "NanoMethViz", infile)), samples = data.frame(sample = sample, group = group), exons = exon_tibble)
**Error in assert_readable(paths) : Path '' does not exist**
Additional context I tried with the full file path instead which did not work either.
Session info
Please include the results of sessionInfo()
sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: aarch64-apple-darwin20
Running under: macOS Sonoma 14.5
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
locale:
[1] C/UTF-8/C/C/C/C
time zone: Europe/Paris
tzcode source: internal
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] org.Hs.eg.db_3.19.1
[2] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
[3] GenomicFeatures_1.56.0
[4] AnnotationDbi_1.66.0
[5] Biobase_2.64.0
[6] GenomicRanges_1.56.1
[7] GenomeInfoDb_1.40.1
[8] IRanges_2.38.0
[9] S4Vectors_0.42.0
[10] BiocGenerics_0.50.0
[11] BiocManager_1.30.23
[12] dplyr_1.1.4
[13] NanoMethViz_3.0.2
[14] ggplot2_3.5.1
loaded via a namespace (and not attached):
[1] DBI_1.2.3 bitops_1.0-7
[3] bsseq_1.40.0 permute_0.9-7
[5] rlang_1.1.4 magrittr_2.0.3
[7] RSQLite_2.3.7 matrixStats_1.3.0
[9] e1071_1.7-14 compiler_4.4.1
[11] DelayedMatrixStats_1.26.0 png_0.1-8
[13] vctrs_0.6.5 stringr_1.5.1
[15] fastmap_1.2.0 pkgconfig_2.0.3
[17] crayon_1.5.3 XVector_0.44.0
[19] utf8_1.2.4 Rsamtools_2.20.0
[21] tzdb_0.4.0 UCSC.utils_1.0.0
[23] ggbeeswarm_0.7.2 bit_4.0.5
[25] purrr_1.0.2 cachem_1.1.0
[27] zlibbioc_1.50.0 beachmat_2.20.0
[29] jsonlite_1.8.8 blob_1.2.4
[31] rhdf5filters_1.16.0 DelayedArray_0.30.1
[33] scico_1.5.0 Rhdf5lib_1.26.0
[35] BiocParallel_1.38.0 irlba_2.3.5.1
[37] parallel_4.4.1 R6_2.5.1
[39] stringi_1.8.4 limma_3.60.3
[41] rtracklayer_1.64.0 Rcpp_1.0.12
[43] assertthat_0.2.1 SummarizedExperiment_1.34.0
[45] R.utils_2.12.3 readr_2.1.5
[47] Matrix_1.7-0 tidyselect_1.2.1
[49] abind_1.4-5 yaml_2.3.8
[51] codetools_0.2-20 curl_5.2.1
[53] lattice_0.22-6 tibble_3.2.1
[55] KEGGREST_1.44.1 withr_3.0.0
[57] ggrastr_1.0.2 proxy_0.4-27
[59] Biostrings_2.72.1 pillar_1.9.0
[61] MatrixGenerics_1.16.0 generics_0.1.3
[63] dbscan_1.1-12 RCurl_1.98-1.14
[65] hms_1.1.3 sparseMatrixStats_1.16.0
[67] munsell_0.5.1 scales_1.3.0
[69] gtools_3.9.5 class_7.3-22
[71] glue_1.7.0 tools_4.4.1
[73] BiocIO_1.14.0 data.table_1.15.4
[75] ScaledMatrix_1.12.0 BSgenome_1.72.0
[77] locfit_1.5-9.9 GenomicAlignments_1.40.0
[79] forcats_1.0.0 fs_1.6.4
[81] cpp11_0.4.7 XML_3.99-0.16.1
[83] rhdf5_2.48.0 grid_4.4.1
[85] tidyr_1.3.1 colorspace_2.1-0
[87] GenomeInfoDbData_1.2.12 patchwork_1.2.0
[89] beeswarm_0.4.0 BiocSingular_1.20.0
[91] HDF5Array_1.32.0 restfulr_0.0.15
[93] vipor_0.4.7 cli_3.6.3
[95] rsvd_1.0.5 fansi_1.0.6
[97] S4Arrays_1.4.1 gtable_0.3.5
[99] R.methodsS3_1.8.2 SparseArray_1.4.8
[101] rjson_0.2.21 memoise_2.0.1
[103] R.oo_1.26.0 lifecycle_1.0.4
[105] httr_1.4.7 statmod_1.5.0
[107] bit64_4.0.5
Hi there,
Try changing
mbr <- ModBamResult(methy = ModBamFiles(samples = infile, system.file(package = "NanoMethViz", infile)), samples = data.frame(sample = sample, group = group), exons = exon_tibble)
to
mbr <- ModBamResult(methy = ModBamFiles(samples = sample_anno$sample, paths = infile), samples = sample_anno, exons = exon_tibble)
The system.file() function is used to get data from within the package, so it was trying to search for files that didn't exist inside the package.
Thank you for your quick answer! So I tried and I get this now:
infile <- stringr::str_sort(list.files("/Volumes/Janet/bam_merged"), num = TRUE)
length(infile)
[1] 16
infile
[1] "SL7-1.bam"
[2] "SL7-2.bam"
[3] "SL19-1.bam"
[4] "SL19-2.bam"
[5] "SL34-1.bam"
[6] "SL34-2.bam"
[7] "SL44-1.bam"
[8] "SL44-2.bam"
[9] "SL45-1.bam"
[10] "SL45-2.bam"
[11] "SL46-1.bam"
[12] "SL46-2.bam"
[13] "SL54-1.bam"
[14] "SL54-2.bam"
[15] "SL56-1.bam"
[16] "SL56-2.bam"
mbr <- ModBamResult(methy = ModBamFiles(samples = sample_anno$sample, paths = infile), samples = sample_anno, exons = exon_tibble)
Error in assert_readable(paths) :
paths 'SL7-1.bam', 'SL7-2.bam',
Shall I mention that the .bai are located in a separated folder (is this ok?)
The index files need to be in the same directory and also the path needs to be either the full path or the full relative path from where your working directory is.
That was that! It works thank you so much! May I ask two last questions:
1- how can I use the ModBamResult with methy_to_bsseq() function? I got this message
bss <- methy_to_bsseq(mbr)
[2024-06-26 09:53:24] creating intermediate files...
Error in enc2utf8(path) : argument is not a character vector
2- There is no gene panel generated underneath the heatmap when plotting, is it because my bam files contain NCBI chromosome names instead of UCSC ones (e.g. CM000678.1 instead of chr16)?
Hi @Shians ,
I'm also trying to load bam files from dorado, but R can't find the ModBamResult and ModBamFiles functions:
> mbr <- ModBamResult(
+ methy = ModBamFiles(
+ samples = sample_table$isolate,
+ sample_table$path
+ ),
+ samples = sample_table,
+ exons = gene_annotation
+ )
Error in ModBamResult(methy = ModBamFiles(samples = sample_table$isolate, :
could not find function "ModBamResult"
> methy <- ModBamFiles(
+ samples = sample_table$isolate,
+ sample_table$path
+ )
Error in ModBamFiles(samples = sample_table$isolate, sample_table$path) :
could not find function "ModBamFiles"
I'm using NanoMethViz v3.1.6 (2024-09-19 [1] Github (Shians/NanoMethViz@5a411a6)). Any help would be appreciated...
Thanks, Ido
Hi Ido,
That's very strange, it should definitely be there in the current version. Can you reload your R instance, the package and try again?
If it still doesn't work, could you paste the results of sessionInfo()?
Shian
I tried reloading R and the library and it's working now. Thanks!