CUT-RUNTools-2.0 icon indicating copy to clipboard operation
CUT-RUNTools-2.0 copied to clipboard

read.meme.py unable to find summery.tsv

Open mostafaabuzaid25 opened this issue 3 years ago • 0 comments

` ==================================== Bulk data analysis pipeline will run ==============================================================

Input FASTQ folder: /public/home/mosta/cut_run/HEK293_Nov23_2020

Sample name: CLP1_293T_S2

Workdir folder: /public/home/mosta/cut_run/HEK293_Nov23_2020/results/

Experiment name:

Experiment type: CUT&RUN

Reference genome: hg19

Spike-in genome: FALSE

Spike-in normalization: FALSE

Fragment 120 filtration: FALSE

================================================================================================================================= [info] Input file is CLP1_293T_S2_R1_001.fastq.gz and CLP1_293T_S2_R2_001.fastq.gz Wed Sep 29 23:52:40 CST 2021 [info] Trimming file CLP1_293T_S2 ... Wed Sep 29 23:52:51 CST 2021 [info] Use Truseq adaptor as default [info] Second stage trimming CLP1_293T_S2 ... Thu Sep 30 00:38:43 CST 2021 [info] Aligning file CLP1_293T_S2 to reference genome... Thu Sep 30 01:09:09 CST 2021 [info] Bowtie2 command: --very-sensitive-local --phred33 -I 10 -X 700 [info] The dovetail mode is off [as parameter frag_120 is off] [info] FASTQ files won't be aligned to the spike-in genome [info] Filtering unmapped fragments... CLP1_293T_S2.bam Thu Sep 30 01:25:56 CST 2021 [info] Sorting BAM... CLP1_293T_S2.bam Thu Sep 30 01:38:37 CST 2021 INFO 2021-09-30 01:39:09 SortSam

********** NOTE: Picard's command line syntax is changing.


********** For more information, please see: ********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)


********** The command line looks like this in the new syntax:


********** SortSam -INPUT sorted/CLP1_293T_S2.step1.bam -OUTPUT sorted/CLP1_293T_S2.bam -SORT_ORDER coordinate -VALIDATION_STRINGENCY SILENT


01:39:37.011 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/public/home/mosta/CUT-RUNTools-2.0/install/picard-2.8.0.jar!/com/intel/gkl/native/libgkl_compression.so [Thu Sep 30 01:39:37 CST 2021] SortSam INPUT=sorted/CLP1_293T_S2.step1.bam OUTPUT=sorted/CLP1_293T_S2.bam SORT_ORDER=coordinate VALIDATION_STRINGENCY=SILENT VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false [Thu Sep 30 01:39:37 CST 2021] Executing as mosta@s006 on Linux 3.10.0-862.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_92-b15; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.21.7-SNAPSHOT INFO 2021-09-30 01:39:37 SortSam Seen many non-increasing record positions. Printing Read-names as well. INFO 2021-09-30 01:41:35 SortSam Read 10,000,000 records. Elapsed time: 00:01:57s. Time for last 10,000,000: 117s. Last read position: chr6:6,079,001. Last read name: M01057:324:000000000-JD6K3:1:2103:22747:14904 INFO 2021-09-30 01:43:11 SortSam Read 20,000,000 records. Elapsed time: 00:03:34s. Time for last 10,000,000: 96s. Last read position: chr12:27,239,374. Last read name: M01057:324:000000000-JD6K3:1:1106:11883:14869 INFO 2021-09-30 01:43:59 SortSam Finished reading inputs, merging and writing to output now. INFO 2021-09-30 01:48:17 SortSam Wrote 10,000,000 records from a sorting collection. Elapsed time: 00:08:40s. Time for last 10,000,000: 256s. Last read position: chr1:194,534,006 INFO 2021-09-30 01:51:58 SortSam Wrote 20,000,000 records from a sorting collection. Elapsed time: 00:12:21s. Time for last 10,000,000: 220s. Last read position: chr7:11,711,635 [Thu Sep 30 01:53:24 CST 2021] picard.sam.SortSam done. Elapsed time: 13.80 minutes. Runtime.totalMemory()=8648654848 [info] Marking duplicates... CLP1_293T_S2.bam Thu Sep 30 01:53:28 CST 2021 [info] Removing duplicates... CLP1_293T_S2.bam Thu Sep 30 02:31:46 CST 2021 [info] Using all the qualified fragments NOT filtering <120bp... CLP1_293T_S2.bam Thu Sep 30 02:39:16 CST 2021 [info] Creating bam index files... CLP1_293T_S2.bam Thu Sep 30 02:39:16 CST 2021 [info] Reads shifting Thu Sep 30 02:46:52 CST 2021 [info] Your data won't be shifted as the experiment_type is specified as CUT&RUN... [info] Peak calling using MACS2... CLP1_293T_S2.bam [info] Logs are stored in /public/home/mosta/cut_run/HEK293_Nov23_2020/results//logs Thu Sep 30 02:46:53 CST 2021 [info] Peak calling with BAM file with NO duplications [info] macs2 narrow peak calling [info] macs2 broad peak calling [info] Getting broad peak summits [info] SEACR stringent peak calling Calling enriched regions without control file Proceeding without normalization of control to experimental bedgraph Using stringent threshold Creating experimental AUC file: Thu Sep 30 03:51:25 CST 2021 Calculating optimal AUC threshold: Thu Sep 30 03:51:27 CST 2021 Using user-provided threshold: Thu Sep 30 03:51:27 CST 2021 Creating thresholded feature file: Thu Sep 30 03:53:25 CST 2021 Empirical false discovery rate = 0.01 Merging nearby features and eliminating control-enriched features: Thu Sep 30 03:53:25 CST 2021 Removing temporary files: Thu Sep 30 03:53:25 CST 2021 Done: Thu Sep 30 03:53:25 CST 2021 [info] Generating the normalized signal file with BigWig format... Thu Sep 30 03:53:26 CST 2021 [info] Your bigwig file won't be normalized with spike-in reads [info] Input file is /public/home/mosta/cut_run/HEK293_Nov23_2020/results//peakcalling/macs2.narrow/CLP1_293T_S2_peaks.narrowPeak [info] Get randomized [1000] peaks from the top [2000] peaks... [info] Filtering the blacklist regions for the selected peak files [info] Getting Fasta sequences [info] Start MEME analysis for de novo motif finding ... [info] Up to 10 will be output ... Unknown option: dreme-m The sequences specified do not exist.

meme-chip [options] [-db ]*

Options: -o

: output to the specified directory, failing if the directory exists -oc : output to the specified directory, overwriting if the directory exists -db : target database for use by Tomtom and CentriMo; if not present, Tomtom and CentriMo are not run -neg : negative (control) sequence file name; the control sequences will be input to MEME, CentriMo and STREME, and MEME will use the Differential Enrichment objective function; sequences are assumed to originate from the same alphabet as the primary sequence file and should be the same length as those; default: no negative sequences are used for MEME or CentriMo, and for STREME, the primary sequences are shuffled to create the negative set -psp-gen use the psp-gen program to create a position-specific prior for use by MEME with its Classic objective function; requires -neg; default: input control sequences directly to MEME and use its Differential Enrichment objective function -dna set the alphabet to DNA; this is the default -rna set the alphabet to RNA -[x]alph : alphabet file; when the x is specified the motif databases are converted to the specified alphabet; default: DNA -dna2rna : input DNA sequences will be converted to RNA -bfile : background file -order : set the order of the Markov background model that is generated from the sequences when a background file is not given; default: 2 -seed : seed for the randomized selection of sequences for MEME and the shuffling of sequences for STREME; default: seed randomly -minw : minimum motif width; default: 6 -maxw : maximum motif width; default: 15 -ccut : maximum size of a sequence before it is cut down to a centered section; a value of 0 indicates the sequences should not be cut down; default: 100 -group-thresh : primary threshold for clustering motifs; default: 0.05 -group-weak : secondary threshold for clustering motifs; default: 2*gthr -filter-thresh : E-value threshold for including motifs; default: 0.05 -time : maximum time that this program has to run and create output in; default: no limit -desc : description of the job -fdesc : file containing plain text description of the job -old-clustering : pick cluster seed motifs based only on significance; default: preferentially select discovered motifs as clustering seeds even if there is a library motif that appears more enriched -noecho : don't echo the commands run -help : display this help message -version : print the version and exit

MEME Specific Options: -meme-brief : reduce size of MEME output files if more than : primary sequences -meme-mod [oops|zoops|anr]: sites used in a single sequence -meme-nmotifs : maximum number of motifs to find; default: 3 : if =0, MEME will not be run -meme-minsites : minimum number of sites per motif -meme-maxsites : maximum number of sites per motif -meme-p : use parallel version with processors -meme-pal : look for palindromes only -meme-searchsize : the maximum portion of the primary sequences (in characters) : used for motif search; MEME's running time increases as : roughly the square of -meme-nrand : MEME should not randomize sequence order

STREME Specific Options: -streme-pvt : stop if hold-out set p-value greater than -streme-nmotifs : maximum number of motifs to find; overrides -streme-pvt; : if =0, STREME will not be run

CentriMo Specific Options: -centrimo-local : compute enrichment of all regions (not only central) -centrimo-score : set the minimum allowed match score -centrimo-maxreg : set the maximum region size to be considered -centrimo-ethresh : set the E-value threshold for reporting -centrimo-noseq : don't store sequence IDs in the output -centrimo-flip : reflect matches on reverse strand around center

SpaMo Specific Options: -spamo-skip : don't run SpaMo

FIMO Specific Options: -fimo-skip : don't run FIMO

[info] De Novo motifs can be found: random1000/MEME_CLP1_293T_S2_shuf ... [info] Loading the De Novo motifs ... Traceback (most recent call last): File "/public/home/mosta/CUT-RUNTools-2.0/install/read.meme.py", line 92, in ss = read_summary(this_dir + "/summary.tsv") File "/public/home/mosta/CUT-RUNTools-2.0/install/read.meme.py", line 7, in read_summary f = open(n) FileNotFoundError: [Errno 2] No such file or directory: 'random1000/MEME_CLP1_293T_S2_shuf/summary.tsv' [info] The signficance cutoff of Fimo scaning is 0.0005... [info] Motif files can be found: random1000/MEME_CLP1_293T_S2_shuf/motifs [info] Filtering the blacklist regions for the selected peak files [info] Getting Fasta sequences [info] Scaning the De Novo motifs for each peak ls: cannot access random1000/MEME_CLP1_293T_S2_shuf/motifs: No such file or directory [info] Output can be found: fimo.result/CLP1_293T_S2

Congrats! The bulk data analysis is complete!


`

mostafaabuzaid25 avatar Sep 29 '21 21:09 mostafaabuzaid25