idr icon indicating copy to clipboard operation
idr copied to clipboard

IDR-pooled peak list

Open Erfan1369 opened this issue 2 years ago • 0 comments

Hi,

I am working on Chip-seq data for evaluating the consistency between replicate using idr. I had some questions about ${POOLED_PEAK_FILE}.

Where can we find the information to generate the ${POOLED_PEAK_FILE} used as a peak-list for the IDR analysis (for TFs analysis)? We have not found the complete description of this file in the ENCODE3 pipeline v1 specifications (https://docs.google.com/document/d/1lG_Rd7fnYgRpSIqrIfuVlAz2dW1VaSQThzk836Db99c/edit#). Would this be a concatenation of ${REP1_PEAK_FILE} and ${REP2_PEAK_FILE} follow by the same procedure used for histone (described at the end of the pipeline):

For narrowPeak files

======================

Find pooled peaks that overlap Rep1 and Rep2 where overlap is defined as the fractional overlap wrt any one of the overlapping peak pairs >= 0.5

intersectBed -wo -a Pooled.narrowPeak.gz -b Rep1.narrowPeak.gz |

awk 'BEGIN{FS="\t";OFS="\t"}{s1=$3-$2; s2=$13-$12; if (($21/s1 >= 0.5) || ($21/s2 >= 0.5)) {print $0}}' | cut -f 1-10 | sort | uniq |

intersectBed -wo -a stdin -b Rep2.narrowPeak.gz |

awk 'BEGIN{FS="\t";OFS="\t"}{s1=$3-$2; s2=$13-$12; if (($21/s1 >= 0.5) || ($21/s2 >= 0.5)) {print $0}}' | cut -f 1-10 | sort | uniq > PooledInRep1AndRep2.narrowPeak.gz

And thus

Pooled.narrowPeak.gz = concatenation of ${REP1_PEAK_FILE} and ${REP2_PEAK_FILE}

PooledInRep1AndRep2.narrowPeak.gz = ${POOLED_PEAK_FILE}

??

We, also, generated ${POOLED_PEAK_FILE} using the cat command as following, but the output was not as we expected:

Cat {REP1_PEAK_FILE} {REP2_PEAK_FILE} > {POOLED_PEAK_FILE}

It would be a great help if you provided us more detail about the way we can generate ${POOLED_PEAK_FILE}.

Regards,

Erfan

Erfan1369 avatar Apr 29 '22 13:04 Erfan1369