peakHeatmap split.default error

Open thugib opened this issue 1 year ago • 0 comments

When I run peakHeatmap, I am getting the following error:

peakHeatmap(MYB3sortGR, TxDb = txdb, upstream = 500, downstream = 500, ignore_strand = TRUE) preparing promoter regions... 2024-01-13 02:13:09 AM preparing tag matrix... 2024-01-13 02:13:09 AM preparing start_site regions by gene... 2024-01-13 02:13:09 AM preparing tag matrix... 2024-01-13 02:13:09 AM Error in split.default(1:length(windows), as.factor(seqnames(windows))) : group length is 0 but data length > 0 In addition: Warning message: In .merge_two_Seqinfo_objects(x, y) : The 2 combined objects have no sequence levels in common. (Use suppressWarnings() to suppress this warning.)

The GRanges object, MYB3sortGR, looks like this:

object with 6213 ranges and 2 metadata columns:

       seqnames            ranges strand |                V4        V5
          <Rle>         <IRanges>  <Rle> |       <character> <integer>
     1     chr1       20174-21639      * |     RepAll_peak_7       292
     2     chr1       37751-38229      * |    RepAll_peak_10       294
     3     chr1       54918-55737      * |    RepAll_peak_20       212
     4     chr1       56447-61014      * |    RepAll_peak_21       185
     5     chr1       67017-68917      * |    RepAll_peak_25       405
   ...      ...               ...    ... .               ...       ...
  6209     chr5 26938521-26939493      * | RepAll_peak_32796       475
  6210     chr5 26949788-26950604      * | RepAll_peak_32801       316
  6211     chr5 26957657-26958139      * | RepAll_peak_32803       165
  6212     chr5 26959763-26960842      * | RepAll_peak_32804       196
  6213     chr5 26969279-26969874      * | RepAll_peak_32809       203
  -------
  seqinfo: 5 sequences from an unspecified genome; no seqlengths

txdb looks like this:

TxDb object:
# Db type: TxDb
# Supporting package: GenomicFeatures
# Data source: BioMart
# Organism: Arabidopsis thaliana
# Taxonomy ID: 3702
# Resource URL: plants.ensembl.org:80
# BioMart database: plants_mart
# BioMart database version: Ensembl Plants Genes 51
# BioMart dataset: athaliana_eg_gene
# BioMart dataset description: Arabidopsis thaliana genes (TAIR10)
# BioMart dataset version: TAIR10
# Full dataset: yes
# miRBase build ID: TAIR10
# Nb of transcripts: 54013
# Db created by: GenomicFeatures package from Bioconductor
# Creation time: 2021-05-23 17:17:33 +0200 (Sun, 23 May 2021)
# GenomicFeatures version at creation time: 1.45.0
# RSQLite version at creation time: 2.2.7
# DBSCHEMAVERSION: 1.2

The code works fine when I use tagMatrix. I get a heatmap.

promoter <- getPromoters(TxDb=txdb, upstream=3000, downstream=3000)
tagMatrix <- getTagMatrix(MYB3GR, windows = promoter)
peakHeatmap(MYB3GR, TxDb = txdb, upstream  = 1000, downstream = 1000 )

The promoter GRanges object looks like this:

GRanges object with 32823 ranges and 0 metadata columns:
          seqnames        ranges strand
             <Rle>     <IRanges>  <Rle>
      [1]     chr1     2631-4631      +
      [2]     chr1    8130-10130      -
      [3]     chr1   12714-14714      -
      [4]     chr1   22121-24121      +
      [5]     chr1   27500-29500      +
      ...      ...           ...    ...
  [32819]     chrM 180268-182268      +
  [32820]     chrM 191025-193025      -
  [32821]     chrM 332725-334725      -
  [32822]     chrM 346266-348266      +
  [32823]     chrM 358739-360739      -
  -------
  seqinfo: 7 sequences from an unspecified genome; no seqlengths

My MYB3sortGR was made from a bed file downloaded from GEO. It was created by:

MYB3sort <- read.delim(file = "GSE80564_AT1G22640_MYB3_ABA_optimal_narrowPeak_p16_ChipSeek_sort.bed", sep = "\t", header = FALSE) MYB3sortGR <- toGRanges(MYB3sort)

The top of the bed file looks like this:

chr5 7014051 7015074 RepAll_peak_27510 883 chr5 19837638 19838886 RepAll_peak_30304 862 chr2 7258604 7262109 RepAll_peak_9773 854 chr1 5590500 5591353 RepAll_peak_1885 836

MYB3sortGR does not have the leftmost column numbers in brackets, so I though maybe there is something wrong with the GRanges object, so I then added column names to my .bed file. The first few lines of the .bed file would now look like this:

chr | start | end | peak | score | chr1 | 20174 | 21639 | RepAll_peak_7 | 292 | chr1 | 37751 | 38229 | RepAll_peak_10 | 294 |

and imported the bed file into R using read.delim() and header = TRUE, to make the dataframe, MYB3sortCN.

From the dataframe, I made a GRanges object:

makeGRangesFromDataFrame(MYB3sortCN)

MYB3sortCN
GRanges object with 6213 ranges and 0 metadata columns:
         seqnames            ranges strand
            <Rle>         <IRanges>  <Rle>
     [1]     chr1       20174-21639      *
     [2]     chr1       37751-38229      *
     [3]     chr1       54918-55737      *
     [4]     chr1       56447-61014      *
     [5]     chr1       67017-68917      *
     ...      ...               ...    ...
  [6209]     chr5 26938521-26939493      *
  [6210]     chr5 26949788-26950604      *
  [6211]     chr5 26957657-26958139      *
  [6212]     chr5 26959763-26960842      *
  [6213]     chr5 26969279-26969874      *
  -------
  seqinfo: 5 sequences from an unspecified genome; no seqlengths

Now, the leftmost column numbers are in brackets as is for the promoters GR object. However, when I run peakHeatmap using this new MYB3sortCN GRanges object, I still get an error, albeit I different error.

peakHeatmap(MYB3sortCN, TxDb = txdb, upstream = 1000, downstream = 1000 )

preparing promoter regions... 2024-01-13 05:04:37 PM preparing tag matrix... 2024-01-13 05:04:37 PM preparing start_site regions by gene... 2024-01-13 05:04:38 PM preparing tag matrix... 2024-01-13 05:04:38 PM Error in file.exists(peak) : invalid 'file' argument

My sessioninfo:

R version 4.3.2 (2023-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS (fossa-ditto X84)

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: America/New_York
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] clusterProfiler_4.8.3                      regioneR_1.32.0                           
 [3] BiocManager_1.30.22                        ChIPseeker_1.39.0                         
 [5] TxDb.Athaliana.BioMart.plantsmart51_0.99.0 GenomicFeatures_1.54.1                    
 [7] AnnotationDbi_1.64.1                       Biobase_2.62.0                            
 [9] GenomicRanges_1.54.1                       GenomeInfoDb_1.38.5                       
[11] IRanges_2.36.0                             S4Vectors_0.40.2                          
[13] BiocGenerics_0.48.1                       

loaded via a namespace (and not attached):
  [1] RColorBrewer_1.1-3                      jsonlite_1.8.8                         
  [3] rstudioapi_0.15.0                       magrittr_2.0.3                         
  [5] farver_2.1.1                            fs_1.6.3                               
  [7] BiocIO_1.12.0                           zlibbioc_1.48.0                        
  [9] vctrs_0.6.5                             memoise_2.0.1                          
 [11] Rsamtools_2.18.0                        RCurl_1.98-1.14                        
 [13] ggtree_3.10.0                           S4Arrays_1.2.0                         
 [15] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 progress_1.2.3                         
 [17] plotrix_3.8-4                           curl_5.2.0                             
 [19] SparseArray_1.2.3                       gridGraphics_0.5-1                     
 [21] KernSmooth_2.23-22                      plyr_1.8.9                             
 [23] cachem_1.0.8                            GenomicAlignments_1.38.1               
 [25] igraph_1.6.0                            lifecycle_1.0.4                        
 [27] pkgconfig_2.0.3                         gson_0.1.0                             
 [29] Matrix_1.6-3                            R6_2.5.1                               
 [31] fastmap_1.1.1                           GenomeInfoDbData_1.2.11                
 [33] MatrixGenerics_1.14.0                   digest_0.6.34                          
 [35] aplot_0.2.2                             enrichplot_1.20.3                      
 [37] colorspace_2.1-0                        patchwork_1.2.0                        
 [39] RSQLite_2.3.4                           labeling_0.4.3                         
 [41] filelock_1.0.3                          fansi_1.0.6                            
 [43] httr_1.4.7                              polyclip_1.10-6                        
 [45] abind_1.4-5                             compiler_4.3.2                         
 [47] downloader_0.4                          bit64_4.0.5                            
 [49] withr_2.5.2                             BiocParallel_1.36.0                    
 [51] viridis_0.6.4                           DBI_1.2.0                              
 [53] gplots_3.1.3                            ggforce_0.4.1                          
 [55] biomaRt_2.58.0                          MASS_7.3-60                            
 [57] rappdirs_0.3.3                          DelayedArray_0.28.0                    
 [59] rjson_0.2.21                            HDO.db_0.99.1                          
 [61] caTools_1.18.2                          gtools_3.9.5                           
 [63] tools_4.3.2                             scatterpie_0.2.1                       
 [65] ape_5.7-1                               glue_1.7.0                             
 [67] restfulr_0.0.15                         nlme_3.1-163                           
 [69] GOSemSim_2.28.0                         shadowtext_0.1.2                       
 [71] grid_4.3.2                              reshape2_1.4.4                         
 [73] fgsea_1.28.0                            generics_0.1.3                         
 [75] BSgenome_1.68.0                         gtable_0.3.4                           
 [77] tidyr_1.3.0                             data.table_1.14.10                     
 [79] hms_1.1.3                               tidygraph_1.3.0                        
 [81] xml2_1.3.6                              utf8_1.2.4                             
 [83] XVector_0.42.0                          ggrepel_0.9.5                          
 [85] pillar_1.9.0                            stringr_1.5.1                          
 [87] yulab.utils_0.1.3                       splines_4.3.2                          
 [89] dplyr_1.1.4                             tweenr_2.0.2                           
 [91] treeio_1.26.0                           BiocFileCache_2.10.1                   
 [93] lattice_0.22-5                          rtracklayer_1.62.0                     
 [95] bit_4.0.5                               tidyselect_1.2.0                       
 [97] GO.db_3.18.0                            Biostrings_2.70.1                      
 [99] gridExtra_2.3                           SummarizedExperiment_1.32.0            
[101] graphlayouts_1.0.2                      matrixStats_1.2.0                      
[103] stringi_1.8.3                           lazyeval_0.2.2                         
[105] ggfun_0.1.3                             yaml_2.3.8                             
[107] boot_1.3-28.1                           codetools_0.2-19                       
[109] ggraph_2.1.0                            tibble_3.2.1                           
[111] qvalue_2.34.0                           ggplotify_0.1.2                        
[113] cli_3.6.2                               munsell_0.5.0                          
[115] Rcpp_1.0.12                             dbplyr_2.4.0                           
[117] png_0.1-8                               XML_3.99-0.16                          
[119] parallel_4.3.2                          ggplot2_3.4.4                          
[121] blob_1.2.4                              prettyunits_1.2.0                      
[123] DOSE_3.28.2                             bitops_1.0-7                           
[125] tidytree_0.4.6                          viridisLite_0.4.2                      
[127] scales_1.3.0                            purrr_1.0.2                            
[129] crayon_1.5.2                            rlang_1.1.3                            
[131] cowplot_1.1.2                           fastmatch_1.1-4                        
[133] KEGGREST_1.42.0

Any idea of what my problem is ?

Jan 14 '24 03:01 thugib