CUT-RUNTools-2.0 icon indicating copy to clipboard operation
CUT-RUNTools-2.0 copied to clipboard

SEACR input and spike-in normalization

Open jordan841220 opened this issue 1 year ago • 0 comments

Thank you for your contribution in this amazing tool. My question might be naive but I couldn't find any explanation in the paper or github (or maybe I simply ignored it).

  1. From the peak calling section in the pipeline as described below, I found that the input bedgraph file of SEACR was inherently the intermediate ouput of macs2 callpeak. I was curious why this CUT&RUN pipeline was designed like this, instead of converting the original BAM/BED file to bedgraph file using bedtools genomecov, and use it as the input of SEACR (I have tried and the results were very different), which was wrote in the SEACR tutorial.
# macs2 narrow peak calling
>&2 echo "[info] macs2 narrow peak calling"
$macs2bin/macs2 callpeak -t $bam_file -g $macs2_genome -f BAMPE -n $base_file --outdir $outdir -q 0.01 -B --SPMR --keep-dup all 2> $logdir/"$base_file".macs2.narrow

# macs2 broad peak calling
>&2 echo "[info] macs2 broad peak calling"
$macs2bin/macs2 callpeak -t $bam_file -g $macs2_genome -f BAMPE -n $base_file --outdir $outdirbroad --broad --broad-cutoff 0.1 -B --SPMR --keep-dup all 2> $logdir/"$base_file".macs2.broad

# macs2 broad peak calling
>&2 echo "[info] Getting broad peak summits"
$pythonbin/python $extratoolsbin/get_summits_broadPeak.py $outdirbroad/"$base_file"_peaks.broadPeak | $bedopsbin/sort-bed - > $outdirbroad/"$base_file"_summits.bed

# SEACR peak calls
>&2 echo "[info] SEACR stringent peak calling"
$macs2bin/macs2 callpeak -t $bam_file -g $macs2_genome -f BAMPE -n $base_file --outdir $outdirseac -q 0.01 -B --SPMR --keep-dup all 2> $logdir/"$base_file".seacr
$pythonbin/python $extratoolsbin/change.bdg.py $outdirseac/"$base_file"_treat_pileup.bdg > $outdirseac/"$base_file"_treat_integer.bdg
chmod +x $extratoolsbin/SEACR_1.1.sh
$extratoolsbin/SEACR_1.1.sh $outdirseac/"$base_file"_treat_integer.bdg 0.01 non stringent $outdirseac/"$base_file"_treat $Rscriptbin
$bedopsbin/sort-bed $outdirseac/"$base_file"_treat.stringent.bed > $outdirseac/"$base_file"_treat.stringent.sort.bed
$pythonbin/python $extratoolsbin/get_summits_seacr.py $outdirseac/"$base_file"_treat.stringent.bed|$bedopsbin/sort-bed - > $outdirseac/"$base_file"_treat.stringent.sort.summits.bed
for i in _summits.bed _peaks.xls _peaks.narrowPeak _control_lambda.bdg _treat_pileup.bdg; do 
    rm -rf $outdirseac/"$base_file"$i
done
  1. The pipeline conducted the spike_in normalization using bamCoverage after peak calling. Isn't it make sense to use the spike_in-normalized bedgraph file to perform peak calling?

Thanks!!

jordan841220 avatar Sep 08 '23 01:09 jordan841220