idr
idr copied to clipboard
IDR-pooled peak list
Hi,
I am working on Chip-seq data for evaluating the consistency between replicate using idr. I had some questions about ${POOLED_PEAK_FILE}.
Where can we find the information to generate the ${POOLED_PEAK_FILE} used as a peak-list for the IDR analysis (for TFs analysis)? We have not found the complete description of this file in the ENCODE3 pipeline v1 specifications (https://docs.google.com/document/d/1lG_Rd7fnYgRpSIqrIfuVlAz2dW1VaSQThzk836Db99c/edit#). Would this be a concatenation of ${REP1_PEAK_FILE} and ${REP2_PEAK_FILE} follow by the same procedure used for histone (described at the end of the pipeline):
For narrowPeak files
======================
Find pooled peaks that overlap Rep1 and Rep2 where overlap is defined as the fractional overlap wrt any one of the overlapping peak pairs >= 0.5
intersectBed -wo -a Pooled.narrowPeak.gz -b Rep1.narrowPeak.gz |
awk 'BEGIN{FS="\t";OFS="\t"}{s1=$3-$2; s2=$13-$12; if (($21/s1 >= 0.5) || ($21/s2 >= 0.5)) {print $0}}' | cut -f 1-10 | sort | uniq |
intersectBed -wo -a stdin -b Rep2.narrowPeak.gz |
awk 'BEGIN{FS="\t";OFS="\t"}{s1=$3-$2; s2=$13-$12; if (($21/s1 >= 0.5) || ($21/s2 >= 0.5)) {print $0}}' | cut -f 1-10 | sort | uniq > PooledInRep1AndRep2.narrowPeak.gz
And thus
Pooled.narrowPeak.gz = concatenation of ${REP1_PEAK_FILE} and ${REP2_PEAK_FILE}
PooledInRep1AndRep2.narrowPeak.gz = ${POOLED_PEAK_FILE}
??
We, also, generated ${POOLED_PEAK_FILE} using the cat command as following, but the output was not as we expected:
Cat {REP1_PEAK_FILE} {REP2_PEAK_FILE} > {POOLED_PEAK_FILE}
It would be a great help if you provided us more detail about the way we can generate ${POOLED_PEAK_FILE}.
Regards,
Erfan