SnapATAC
SnapATAC copied to clipboard
runHomer error
Hi,
I am getting no results when i runHomer.
I add pmat to my snapATAC file in R. the peak files (bed) added to the .snap file are in the format chr1 102949 103224 I also tried with bed file in the format b'chr1' 102949 103224 but have the same error
after i runHomer i get 'Illegal division by zero at /home/......../homer/bin//findKnownMotifs.pl line 152.' I think this is because of
Sequences processed: 0 total
given after the peak file is assessed.
Please help :)
here i all my code in R:
motifs = runHomer(
-
x.sp[,idy.ls[["3"]],"pmat"],
-
mat = "pmat",
-
path.to.homer = "/home/......../homer/bin/findMotifsGenome.pl",
-
result.dir = "./homer/C3",
-
num.cores=5,
-
genome = 'hg38',
-
motif.length = 10,
-
scan.size = 200,
-
optimize.count = 2,
-
background = 'automatic',
-
local.background = FALSE,
-
only.known = TRUE,
-
only.denovo = FALSE,
-
fdr.num = 5,
-
cache = 100,
-
overwrite = TRUE,
-
keep.minimal = FALSE
-
);
Position file = ./homer/C3/target_2cec741c5915 Genome = hg38 Output Directory = ./homer/C3 Motif length set at 10, Fragment size set to 200 Will optimize 2 putative motifs Using 5 CPUs Using 100 MB for statistics cache Will randomize and repeat motif finding 5 times to estimate FDR Will not run homer for de novo motifs Found mset for "human", will check against vertebrates motifs Peak/BED file conversion summary: BED/Header formatted lines: 2000 peakfile formatted lines: 0
Peak File Statistics: Total Peaks: 2000 Redundant Peak IDs: 0 Peaks lacking information: 0 (need at least 5 columns per peak) Peaks with misformatted coordinates: 0 (should be integer) Peaks with misformatted strand: 0 (should be either +/- or 0/1)
Peak file looks good!
Background files for 200 bp fragments found. Custom genome sequence directory: /home/......./homer//data/genomes/hg38//
Extracting sequences from file: /home/......./homer//data/genomes/hg38///genome.fa Looking for peak sequences in a single file (/home/......../homer//data/genomes/hg38///genome.fa)
Not removing redundant sequences
Sequences processed: 0 total
Frequency Bins: 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.6 0.7 0.8 Freq Bin Count
Total sequences set to 50000
Choosing background that matches in CpG/GC content... Illegal division by zero at /home/...../homer/bin//assignGeneWeights.pl line 63. Assembling sequence file... Normalizing lower order oligos using homer2
Reading input files... 0 total sequences read Autonormalization: 1-mers (4 total) A inf% inf% -nan C inf% inf% -nan G inf% inf% -nan T inf% inf% -nan Autonormalization: 2-mers (16 total) AA inf% inf% -nan CA inf% inf% -nan GA inf% inf% -nan TA inf% inf% -nan AC inf% inf% -nan CC inf% inf% -nan GC inf% inf% -nan TC inf% inf% -nan AG inf% inf% -nan CG inf% inf% -nan GG inf% inf% -nan TG inf% inf% -nan AT inf% inf% -nan CT inf% inf% -nan GT inf% inf% -nan TT inf% inf% -nan Autonormalization: 3-mers (64 total) Normalization weights can be found in file: ./homer/C3/seq.autonorm.tsv Converging on autonormalization solution: ............................................................................... Final normalization: Autonormalization: 1-mers (4 total) A inf% inf% -nan C inf% inf% -nan G inf% inf% -nan T inf% inf% -nan Autonormalization: 2-mers (16 total) AA inf% inf% -nan CA inf% inf% -nan GA inf% inf% -nan TA inf% inf% -nan AC inf% inf% -nan CC inf% inf% -nan GC inf% inf% -nan TC inf% inf% -nan AG inf% inf% -nan CG inf% inf% -nan GG inf% inf% -nan TG inf% inf% -nan AT inf% inf% -nan CT inf% inf% -nan GT inf% inf% -nan TT inf% inf% -nan Autonormalization: 3-mers (64 total) Finished preparing sequence/group files
Known motif enrichment
Reading input files... 0 total sequences read 440 motifs loaded Cache length = 5000 Using binomial scoring Checking enrichment of 440 motif(s) |0% 50% 100%|
Illegal division by zero at /home/..../homer/bin//findKnownMotifs.pl line 152. Skipping... Job finished - if results look good, please send beer to ..
Cleaning up tmp files...
Error in file(file, "rt") : cannot open the connection In addition: Warning message: In file(file, "rt") : cannot open file './homer/C3/knownResults.txt': No such file or directory
I am getting the same error, but still don't have a solution. I am wondering if this is because I am using hg38 instead of hg19?
I tried hg19 and hg38 and get the same error.
On Tue, 28 Jul 2020 at 13:04, ldorozco [email protected] wrote:
I am getting the same error, but still don't have a solution. I am wondering if this is because I am using hg38 instead of hg19?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/r3fang/SnapATAC/issues/202#issuecomment-665190752, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM3S4C2TQWNTXGHZKNZJ2ATR54HLVANCNFSM4PFZDMMQ .
I figured it out. The 'peaks' GRanges object is messed up, it has "b'" attached to the beginning of each chromosome:
head(x.sp@peak) [1] b'chr1' 191418-192119 * | b'chr1':191418-192119 [2] b'chr1' 634093-634293 * | b'chr1':634093-634293
If I fIx this GRanges object as follows, Homer can run:
library(stringr) tmp_gr = as.data.frame(x.sp@peak) tmp_seqnames = as.character(tmp_gr[,1]) tmp_seqnames_new = str_replace( str_replace(tmp_seqnames, "b'", ""), "'", "") tmp_gr[,1] = tmp_seqnames_new tmp_names = as.character(tmp_gr[,6]) tmp_names_new = str_replace( str_replace(tmp_names, "b'", ""), "'", "") tmp_gr[,6] = tmp_names_new tmp_gr_new = makeGRangesFromDataFrame(tmp_gr, keep.extra.columns=TRUE) x.sp@peak = tmp_gr_new
This works! thanks for the tip!