scATAC-pro
scATAC-pro copied to clipboard
how to run a scATAC-pro analysis with multiple biological replicates
Dear developers of scATAC-pro,
I am not sure what are the steps to run a scATAC-pro-based analysis with up to 6 replicates, each of which has various of sequencing depth and cells captured. Could you give some suggestion?
Thank you.
Haibo
You can process each sample using process
module first, followed by integrate
module.
Hello Wenbao, Thank you for your quick response. I did run like that. But the integrate module seems not to perform all downstream analysis. So I ran downstream after integration, I got errors because of position sorted bam files were not detected. The error message is as below:
[E::hts_open_format] Failed to open file output/mapping_result/PBMC.positionsort.MAPQ30.bam samtools view: failed to open "output/mapping_result/PBMC.positionsort.MAPQ30.bam" for reading: No such file or directory
The following is my job scripts.
## Process #!/bin/bash
#BSUB -n 48 # minmal numbers of processors required for a parallel job #BSUB -R rusage[mem=8000] # ask for memory 5G #BSUB -R "select[rh=8]" #BSUB -W 128:00 #limit the job to be finished in 12 hours #BSUB -J "fastQC[1-6]" #BSUB -q long # which queue we want to run in #BSUB -o logs/out.%J.%I.txt # log #BSUB -e logs/err.%J.%I.txt # error #BSUB -R "span[hosts=1]" # All hosts on the same chassis" ##BSUB -w "done(5423513)"
i=$(($LSB_JOBINDEX- 1)) mkdir -p logs
fastq_dir=(ls -d data/*_fastqs | grep -v '10k_PBMC_ATAC_nextgem_Chromium'
)
out_dir=(PBMC_10K_N PBMC_1K_N PBMC_500_N PBMC_5K_N PBMC_10K_V PBMC_5K_V)
time singularity exec --cleanenv --env ncore=${LSB_DJOB_NUMPROC}
scATAC-pro_sandbox
/usr/local/bin/scATAC-pro_1.5.0/scATAC-pro -s process
-i ${fastq_dir[$i]}
-o results/${out_dir[$i]}
-c scatac_pro.config
## integration
#!/bin/bash
#BSUB -n 8 # minmal numbers of processors required for a parallel job #BSUB -R rusage[mem=8000] # ask for memory 5G #BSUB -R "select[rh=8]" #BSUB -W 128:00 #limit the job to be finished in 12 hours #BSUB -J "fastQC[1]" #BSUB -q long # which queue we want to run in #BSUB -o logs/out.%J.%I.txt # log #BSUB -e logs/err.%J.%I.txt # error #BSUB -R "span[hosts=1]" # All hosts on the same chassis" ##BSUB -w "done(5423513)"
i=$(($LSB_JOBINDEX- 1)) mkdir -p logs
peak_files=ls -1 results/PBMC_*/peaks/MACS2/PBMC_features_BlacklistRemoved.bed | perl -p -e 's{\n}{,}'
time singularity exec --cleanenv --env ncore=${LSB_DJOB_NUMPROC}
scATAC-pro_sandbox
/usr/local/bin/scATAC-pro_1.5.0/scATAC-pro -s integrate
-i ${peak_files}200,0.01
-c scatac_pro.config
## Downstream #!/bin/bash
#BSUB -n 8 # minmal numbers of processors required for a parallel job #BSUB -R rusage[mem=8000] # ask for memory 5G #BSUB -R "select[rh=8]" #BSUB -W 128:00 #limit the job to be finished in 12 hours #BSUB -J "fastQC[1]" #BSUB -q long # which queue we want to run in #BSUB -o logs/out.%J.%I.txt # log #BSUB -e logs/err.%J.%I.txt # error #BSUB -R "span[hosts=1]" # All hosts on the same chassis" ##BSUB -w "done(5423513)"
i=$(($LSB_JOBINDEX- 1)) mkdir -p logs peak_cell_mtx=output/integrated/seurat_obj_harmony.rds
time singularity exec --cleanenv --env ncore=${LSB_DJOB_NUMPROC}
scATAC-pro_sandbox
/usr/local/bin/scATAC-pro_1.5.0/scATAC-pro -s downstream
-i ${peak_cell_mtx}
-c scatac_pro.config
On Tue, Aug 23, 2022 at 10:27 AM Wenbao Yu @.***> wrote:
You can process each sample using process module first, followed by integrate module.
— Reply to this email directly, view it on GitHub https://github.com/wbaopaul/scATAC-pro/issues/58#issuecomment-1224155078, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALR5WD3RY6EGZHJPBCCGUYLV2TNVDANCNFSM57LIU6DA . You are receiving this because you authored the thread.Message ID: @.***>
Hello Haibo,
Oh, the downstream
module was designed for each sample, not for a integrated object. You may need to run the exact module, like the clustering and motif_analysis separately with the integrated seurat object.
Thank you, Wenbao, for the instructions.
One question is how I should provide an scATAC-pro-aware BAM file for the merge experiment for downstream analysis. Do I need to manually merge the individual filtered, sorted BAM files into a big one and put it into a properly named directory? Best,
Haibo
On Tue, Aug 23, 2022 at 10:57 AM Wenbao Yu @.***> wrote:
Hello Haibo, Oh, the downstream module was designed for each sample, not for a integrated object. You may need to run the exact module, like the clustering and motif_analysis separately with the integrated seurat object.
— Reply to this email directly, view it on GitHub https://github.com/wbaopaul/scATAC-pro/issues/58#issuecomment-1224195255, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALR5WD3GL4MANVHK45AVWIDV2TRG7ANCNFSM57LIU6DA . You are receiving this because you authored the thread.Message ID: @.***>
Thanks. I am not sure. I haven't used scATAC-pro with merged bam file for integrated data. Which specific step within downsteam module do you want to run? You don't have to run everything in the downsteam that was desifned for a single sample for integrated data.
Wenbao
Thanks again, Wenbao. I think I need to run all the steps except "split_bam" after integration. By the way, does the pipeline remove duplicates in the final filtered, sorted BAM files?
Haibo
On Tue, Aug 23, 2022 at 11:16 AM Wenbao Yu @.***> wrote:
Thanks. I am not sure. I haven't used scATAC-pro with merged bam file for integrated data. Which specific step within downsteam module do you want to run? You don't have to run everything in the downsteam that was desifned for a single sample for integrated data.
Wenbao
— Reply to this email directly, view it on GitHub https://github.com/wbaopaul/scATAC-pro/issues/58#issuecomment-1224220384, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALR5WD6HYTUKV6XYMWNLS7LV2TTOVANCNFSM57LIU6DA . You are receiving this because you authored the thread.Message ID: @.***>
The duplicates were not removed from the bam file. (But note that we only use unique fragments in scATAC-pro analysis.)
Got it. Thank you.
Haibo
On Tue, Aug 23, 2022 at 11:52 AM Wenbao Yu @.***> wrote:
The duplicates were not removed from the bam file. (But note that we only use unique fragments in scATAC-pro analysis.)
— Reply to this email directly, view it on GitHub https://github.com/wbaopaul/scATAC-pro/issues/58#issuecomment-1224265687, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALR5WD4W3L7OEBDMK7CF24DV2TXTLANCNFSM57LIU6DA . You are receiving this because you authored the thread.Message ID: @.***>