marge
marge copied to clipboard
Something is wrong
Hi @robertamezquita
Nice package. I have successfully used it in past for mouse data, however I am facing some issues here with the TAIR10 genome. Do you have any see any issues here?
Thank you.
Data
head(data)
seqnames start end name width
1 2 3241484 3242161 region_141 678
2 2 3264247 3266248 region_151 2002
3 2 3278221 3278761 region_156 541
4 2 3279239 3280592 region_157 1354
5 2 3292341 3293676 region_162 1336
6 2 3294158 3294957 region_163 800
Background
head(bg)
seqnames start end name width
1 1 640542 640966 region_1 425
2 1 7498331 7498558 region_2 228
3 1 8392125 8392403 region_3 279
4 1 8806819 8807899 region_4 1081
5 1 9135657 9136059 region_5 403
6 1 9137844 9137974 region_6 131
Running analysis
find_motifs_genome(x = data, path = "output/", genome = "tair10",
motif_length = c(6, 8, 10, 12), scan_size = 100,
optimize_count = 8, background = bg,
local_background = FALSE, only_known = FALSE,
only_denovo = FALSE, fdr_num = 5, cores = 10,
overwrite = TRUE, keep_minimal = FALSE)
Message
Position file = /tmp/RtmpSTvVaO/target_19236370535e
Genome = tair10
Output Directory = output/CpG/
Motif length set at 6,8,10,12,
Fragment size set to 100
Will optimize 8 putative motifs
Using 10 CPUs
Using 1000 MB for statistics cache
Will randomize and repeat motif finding 5 times to estimate FDR
background position file: /tmp/RtmpSTvVaO/background_19231d5f8434
Found mset for "arabidopsis", will check against plants motifs
Peak/BED file conversion summary:
BED/Header formatted lines: 555
peakfile formatted lines: 0
Peak File Statistics:
Total Peaks: 555
Redundant Peak IDs: 0
Peaks lacking information: 0 (need at least 5 columns per peak)
Peaks with misformatted coordinates: 0 (should be integer)
Peaks with misformatted strand: 0 (should be either +/- or 0/1)
Peak file looks good!
Peak/BED file conversion summary:
BED/Header formatted lines: 555
peakfile formatted lines: 0
Max distance to merge: 100 bp
Calculating co-bound peaks relative to reference: output/CpG//bg.clean.pos
Comparing peaks: (peakfile, overlapping peaks, logRatio(obs/expected), logP)
** output/CpG//target.clean..pos 555 6.32 -5989.42
** Pairwise stats are approx with fixed distance (-d 100) and -cobound #
They get worse as the size increases and peaks from a single file start overlapping
To get accurate ones, adjust peak sizes first with adjustPeakFile.pl and
then rerun mergePeaks with the "-d given" option (only applies to -cobound #)
Co-bound by 0 peaks: 0
Co-bound by 1 peaks: 555 (max: 555 effective total)
Custom genome sequence directory: /home/rstudio/r_lib/homer/.//data/genomes/tair10//
Extracting sequences from file: /home/rstudio/r_lib/homer/.//data/genomes/tair10///genome.fa
Looking for peak sequences in a single file (/home/rstudio/r_lib/homer/.//data/genomes/tair10///genome.fa)
Extracting 116 sequences from 1
Extracting 141 sequences from 2
Extracting 90 sequences from 3
Extracting 61 sequences from 4
Extracting 120 sequences from 5
Not removing redundant sequences
Sequences processed:
Auto detected maximum sequence length of 101 bp
528 total
Frequency Bins: 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.6 0.7 0.8
Freq Bin Count
0.2 0 23
0.25 1 23
0.3 2 60
0.35 3 86
0.4 4 87
0.45 5 125
0.5 6 73
0.6 7 46
0.7 8 5
Bin # Targets # Background Background Weight
Normalizing lower order oligos using homer2
Reading input files...
0 total sequences read
Autonormalization: 1-mers (4 total)
A inf% inf% -nan
C inf% inf% -nan
G inf% inf% -nan
T inf% inf% -nan
Autonormalization: 2-mers (16 total)
AA inf% inf% -nan
CA inf% inf% -nan
GA inf% inf% -nan
TA inf% inf% -nan
AC inf% inf% -nan
CC inf% inf% -nan
GC inf% inf% -nan
TC inf% inf% -nan
AG inf% inf% -nan
CG inf% inf% -nan
GG inf% inf% -nan
TG inf% inf% -nan
AT inf% inf% -nan
CT inf% inf% -nan
GT inf% inf% -nan
TT inf% inf% -nan
Autonormalization: 3-mers (64 total)
Normalization weights can be found in file: output/CpG//seq.autonorm.tsv
Converging on autonormalization solution:
...............................................................................
Final normalization: Autonormalization: 1-mers (4 total)
A inf% inf% -nan
C inf% inf% -nan
G inf% inf% -nan
T inf% inf% -nan
Autonormalization: 2-mers (16 total)
AA inf% inf% -nan
CA inf% inf% -nan
GA inf% inf% -nan
TA inf% inf% -nan
AC inf% inf% -nan
CC inf% inf% -nan
GC inf% inf% -nan
TC inf% inf% -nan
AG inf% inf% -nan
CG inf% inf% -nan
GG inf% inf% -nan
TG inf% inf% -nan
AT inf% inf% -nan
CT inf% inf% -nan
GT inf% inf% -nan
TT inf% inf% -nan
Autonormalization: 3-mers (64 total)
Finished preparing sequence/group files
----------------------------------------------------------
Known motif enrichment
Reading input files...
0 total sequences read
506 motifs loaded
Cache length = 15811
Using binomial scoring
Checking enrichment of 506 motif(s)
|0% 50% 100%|
=================================================================================
Illegal division by zero at /home/rstudio/r_lib/homer//bin/findKnownMotifs.pl line 152.
----------------------------------------------------------
De novo motif finding (HOMER)
Scanning input files...
!!! Something is wrong... are you sure you chose the right length for motif finding?
!!! i.e. also check your sequence file!!!
Performing empirical FDR calculation for length 6 (n=5)
1 of 5
2 of 5
3 of 5
4 of 5
5 of 5
Scanning input files...
!!! Something is wrong... are you sure you chose the right length for motif finding?
!!! i.e. also check your sequence file!!!
Performing empirical FDR calculation for length 8 (n=5)
1 of 5
2 of 5
3 of 5
4 of 5
5 of 5
Scanning input files...
!!! Something is wrong... are you sure you chose the right length for motif finding?
!!! i.e. also check your sequence file!!!
Performing empirical FDR calculation for length 10 (n=5)
1 of 5
2 of 5
3 of 5
4 of 5
5 of 5
-blen automatically set to 2
Scanning input files...
!!! Something is wrong... are you sure you chose the right length for motif finding?
!!! i.e. also check your sequence file!!!
Performing empirical FDR calculation for length 12 (n=5)
1 of 5
2 of 5
3 of 5
4 of 5
5 of 5
Use of uninitialized value in numeric gt (>) at /home/rstudio/r_lib/homer//bin/compareMotifs.pl line 1394.
!!! Filtered out all motifs!!!
Job finished - if results look good, please send beer to ..
Cleaning up tmp files...
Warning message:
In background != "automatic" && local_background != FALSE :
'length(x) = 11100 > 1' in coercion to 'logical(1)'
SessionInfo
xfun::session_info()
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.5 LTS, RStudio 2022.7.1.554
Locale:
LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8 LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
Package version:
AnnotationDbi_1.58.0 askpass_1.1 assertthat_0.2.1
base64enc_0.1.3 bayestestR_0.13.0 beachmat_2.12.0
BH_1.78.0.0 Biobase_2.56.0 BiocFileCache_2.4.0
BiocGenerics_0.42.0 BiocIO_1.6.0 BiocParallel_1.30.3
biomaRt_2.52.0 Biostrings_2.64.1 bit_4.0.4
bit64_4.0.5 bitops_1.0-7 blob_1.2.3
brew_1.0.7 brio_1.1.3 BSgenome_1.64.0
bslib_0.4.0 bsseq_1.32.0 cachem_1.0.6
callr_3.7.2 cli_3.3.0 clipr_0.8.0
codetools_0.2-18 colorspace_2.0-3 commonmark_1.8.0
compiler_4.2.1 cpp11_0.4.2 crayon_1.5.1
credentials_1.3.2 crosstalk_1.2.0 curl_4.3.2
data.table_1.14.2 datawizard_0.6.2 DBI_1.1.3
dbplyr_2.2.1 DelayedArray_0.22.0 DelayedMatrixStats_1.18.1
desc_1.4.1 details_0.3.0 devtools_2.4.4
diffobj_0.3.5 digest_0.6.29 downlit_0.4.2
dplyr_1.0.10 DT_0.25 effectsize_0.7.0.5
ellipsis_0.3.2 evaluate_0.16 fansi_1.0.3
farver_2.1.1 fastmap_1.1.0 filelock_1.0.2
fontawesome_0.3.0 formatR_1.12 fs_1.5.2
futile.logger_1.4.3 futile.options_1.0.1 generics_0.1.3
GenomeInfoDb_1.32.4 GenomeInfoDbData_1.2.8 GenomicAlignments_1.32.1
GenomicFeatures_1.48.4 GenomicRanges_1.48.0 gert_1.8.0
gh_1.3.0 gitcreds_0.1.1 glue_1.6.2
graphics_4.2.1 grDevices_4.2.1 grid_4.2.1
gtools_3.9.3 HDF5Array_1.24.2 highr_0.9
hms_1.1.2 htmltools_0.5.3 htmlwidgets_1.5.4
httpuv_1.6.5 httr_1.4.4 ini_0.3.1
insight_0.18.4 IRanges_2.30.1 jquerylib_0.1.4
jsonlite_1.8.0 KEGGREST_1.36.3 knitr_1.40
labeling_0.4.2 lambda.r_1.2.4 later_1.3.0
lattice_0.20-45 lazyeval_0.2.2 lifecycle_1.0.1
limma_3.52.4 locfit_1.5-9.6 magrittr_2.0.3
marge_0.0.4.9999 Matrix_1.5-1 MatrixGenerics_1.8.1
matrixStats_0.62.0 memoise_2.0.1 methods_4.2.1
mime_0.12 miniUI_0.1.1.1 munsell_0.5.0
openssl_2.0.2 parallel_4.2.1 parameters_0.18.2
performance_0.10.0 permute_0.9-7 pillar_1.8.1
pkgbuild_1.3.1 pkgconfig_2.0.3 pkgdown_2.0.6
pkgload_1.3.0 plogr_0.2.0 png_0.1-7
praise_1.0.0 prettyunits_1.1.1 processx_3.7.0
profvis_0.3.7 progress_1.2.2 promises_1.2.0.1
ps_1.7.1 purrr_0.3.4 R.cache_0.16.0
R.methodsS3_1.8.2 R.oo_1.25.0 R.utils_2.12.0
R6_2.5.1 ragg_1.2.2 rappdirs_0.3.3
rcmdcheck_1.4.0 RColorBrewer_1.1.3 Rcpp_1.0.9
RCurl_1.98-1.9 readr_2.1.2 rematch2_2.1.2
remotes_2.4.2 report_0.5.5 restfulr_0.0.15
rhdf5_2.40.0 rhdf5filters_1.8.0 Rhdf5lib_1.18.2
Rhtslib_1.28.0 rjson_0.2.21 rlang_1.0.5
rmarkdown_2.16 roxygen2_7.2.1 rprojroot_2.0.3
Rsamtools_2.12.0 RSQLite_2.2.16 rstudioapi_0.14
rtracklayer_1.56.1 rversions_2.1.2 S4Vectors_0.34.0
sass_0.4.2 scales_1.2.1 sessioninfo_1.2.2
shiny_1.7.2 snow_0.4.4 sourcetools_0.1.7
sparseMatrixStats_1.8.0 stats_4.2.1 stats4_4.2.1
stringi_1.7.8 stringr_1.4.1 styler_1.7.0
SummarizedExperiment_1.26.1 sys_3.4 systemfonts_1.0.4
testthat_3.1.4 textshaping_0.3.6 tibble_3.1.8
tidyr_1.2.0 tidyselect_1.1.2 tinytex_0.41
tools_4.2.1 TxDb.Athaliana.BioMart.plantsmart51_0.99.0 tzdb_0.3.0
urlchecker_1.0.1 usethis_2.1.6 utf8_1.2.2
utils_4.2.1 vctrs_0.4.1 viridisLite_0.4.1
vroom_1.5.7 waldo_0.4.0 whisker_0.4
withr_2.5.0 writexl_1.4.0 xfun_0.32
XML_3.99-0.11 xml2_1.3.3 xopen_1.0.0
xtable_1.8-4 XVector_0.36.0 yaml_2.3.5
zip_2.2.0 zlibbioc_1.42.0