scATAC-pro icon indicating copy to clipboard operation
scATAC-pro copied to clipboard

how to run a scATAC-pro analysis with multiple biological replicates

Open haibol2016 opened this issue 2 years ago • 8 comments

Dear developers of scATAC-pro,

    I am not sure what are the steps to run a scATAC-pro-based analysis with up to 6 replicates, each of which has various of sequencing depth and cells captured. Could you give some suggestion?

Thank you.

Haibo

haibol2016 avatar Aug 23 '22 12:08 haibol2016

You can process each sample using process module first, followed by integrate module.

wbaopaul avatar Aug 23 '22 14:08 wbaopaul

Hello Wenbao, Thank you for your quick response. I did run like that. But the integrate module seems not to perform all downstream analysis. So I ran downstream after integration, I got errors because of position sorted bam files were not detected. The error message is as below:

[E::hts_open_format] Failed to open file output/mapping_result/PBMC.positionsort.MAPQ30.bam samtools view: failed to open "output/mapping_result/PBMC.positionsort.MAPQ30.bam" for reading: No such file or directory

The following is my job scripts.

## Process #!/bin/bash

#BSUB -n 48 # minmal numbers of processors required for a parallel job #BSUB -R rusage[mem=8000] # ask for memory 5G #BSUB -R "select[rh=8]" #BSUB -W 128:00 #limit the job to be finished in 12 hours #BSUB -J "fastQC[1-6]" #BSUB -q long # which queue we want to run in #BSUB -o logs/out.%J.%I.txt # log #BSUB -e logs/err.%J.%I.txt # error #BSUB -R "span[hosts=1]" # All hosts on the same chassis" ##BSUB -w "done(5423513)"

i=$(($LSB_JOBINDEX- 1)) mkdir -p logs

fastq_dir=(ls -d data/*_fastqs | grep -v '10k_PBMC_ATAC_nextgem_Chromium') out_dir=(PBMC_10K_N PBMC_1K_N PBMC_500_N PBMC_5K_N PBMC_10K_V PBMC_5K_V) time singularity exec --cleanenv --env ncore=${LSB_DJOB_NUMPROC} scATAC-pro_sandbox
/usr/local/bin/scATAC-pro_1.5.0/scATAC-pro -s process
-i ${fastq_dir[$i]}
-o results/${out_dir[$i]}
-c scatac_pro.config

## integration

#!/bin/bash

#BSUB -n 8 # minmal numbers of processors required for a parallel job #BSUB -R rusage[mem=8000] # ask for memory 5G #BSUB -R "select[rh=8]" #BSUB -W 128:00 #limit the job to be finished in 12 hours #BSUB -J "fastQC[1]" #BSUB -q long # which queue we want to run in #BSUB -o logs/out.%J.%I.txt # log #BSUB -e logs/err.%J.%I.txt # error #BSUB -R "span[hosts=1]" # All hosts on the same chassis" ##BSUB -w "done(5423513)"

i=$(($LSB_JOBINDEX- 1)) mkdir -p logs

peak_files=ls -1 results/PBMC_*/peaks/MACS2/PBMC_features_BlacklistRemoved.bed | perl -p -e 's{\n}{,}'

time singularity exec --cleanenv --env ncore=${LSB_DJOB_NUMPROC} scATAC-pro_sandbox
/usr/local/bin/scATAC-pro_1.5.0/scATAC-pro -s integrate
-i ${peak_files}200,0.01
-c scatac_pro.config

## Downstream #!/bin/bash

#BSUB -n 8 # minmal numbers of processors required for a parallel job #BSUB -R rusage[mem=8000] # ask for memory 5G #BSUB -R "select[rh=8]" #BSUB -W 128:00 #limit the job to be finished in 12 hours #BSUB -J "fastQC[1]" #BSUB -q long # which queue we want to run in #BSUB -o logs/out.%J.%I.txt # log #BSUB -e logs/err.%J.%I.txt # error #BSUB -R "span[hosts=1]" # All hosts on the same chassis" ##BSUB -w "done(5423513)"

i=$(($LSB_JOBINDEX- 1)) mkdir -p logs peak_cell_mtx=output/integrated/seurat_obj_harmony.rds

time singularity exec --cleanenv --env ncore=${LSB_DJOB_NUMPROC} scATAC-pro_sandbox
/usr/local/bin/scATAC-pro_1.5.0/scATAC-pro -s downstream
-i ${peak_cell_mtx}
-c scatac_pro.config

On Tue, Aug 23, 2022 at 10:27 AM Wenbao Yu @.***> wrote:

You can process each sample using process module first, followed by integrate module.

— Reply to this email directly, view it on GitHub https://github.com/wbaopaul/scATAC-pro/issues/58#issuecomment-1224155078, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALR5WD3RY6EGZHJPBCCGUYLV2TNVDANCNFSM57LIU6DA . You are receiving this because you authored the thread.Message ID: @.***>

haibol2016 avatar Aug 23 '22 14:08 haibol2016

Hello Haibo, Oh, the downstream module was designed for each sample, not for a integrated object. You may need to run the exact module, like the clustering and motif_analysis separately with the integrated seurat object.

wbaopaul avatar Aug 23 '22 14:08 wbaopaul

Thank you, Wenbao, for the instructions.

One question is how I should provide an scATAC-pro-aware BAM file for the merge experiment for downstream analysis. Do I need to manually merge the individual filtered, sorted BAM files into a big one and put it into a properly named directory? Best,

Haibo

On Tue, Aug 23, 2022 at 10:57 AM Wenbao Yu @.***> wrote:

Hello Haibo, Oh, the downstream module was designed for each sample, not for a integrated object. You may need to run the exact module, like the clustering and motif_analysis separately with the integrated seurat object.

— Reply to this email directly, view it on GitHub https://github.com/wbaopaul/scATAC-pro/issues/58#issuecomment-1224195255, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALR5WD3GL4MANVHK45AVWIDV2TRG7ANCNFSM57LIU6DA . You are receiving this because you authored the thread.Message ID: @.***>

haibol2016 avatar Aug 23 '22 15:08 haibol2016

Thanks. I am not sure. I haven't used scATAC-pro with merged bam file for integrated data. Which specific step within downsteam module do you want to run? You don't have to run everything in the downsteam that was desifned for a single sample for integrated data.

Wenbao

wbaopaul avatar Aug 23 '22 15:08 wbaopaul

Thanks again, Wenbao. I think I need to run all the steps except "split_bam" after integration. By the way, does the pipeline remove duplicates in the final filtered, sorted BAM files?

Haibo

On Tue, Aug 23, 2022 at 11:16 AM Wenbao Yu @.***> wrote:

Thanks. I am not sure. I haven't used scATAC-pro with merged bam file for integrated data. Which specific step within downsteam module do you want to run? You don't have to run everything in the downsteam that was desifned for a single sample for integrated data.

Wenbao

— Reply to this email directly, view it on GitHub https://github.com/wbaopaul/scATAC-pro/issues/58#issuecomment-1224220384, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALR5WD6HYTUKV6XYMWNLS7LV2TTOVANCNFSM57LIU6DA . You are receiving this because you authored the thread.Message ID: @.***>

haibol2016 avatar Aug 23 '22 15:08 haibol2016

The duplicates were not removed from the bam file. (But note that we only use unique fragments in scATAC-pro analysis.)

wbaopaul avatar Aug 23 '22 15:08 wbaopaul

Got it. Thank you.

Haibo

On Tue, Aug 23, 2022 at 11:52 AM Wenbao Yu @.***> wrote:

The duplicates were not removed from the bam file. (But note that we only use unique fragments in scATAC-pro analysis.)

— Reply to this email directly, view it on GitHub https://github.com/wbaopaul/scATAC-pro/issues/58#issuecomment-1224265687, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALR5WD4W3L7OEBDMK7CF24DV2TXTLANCNFSM57LIU6DA . You are receiving this because you authored the thread.Message ID: @.***>

haibol2016 avatar Aug 23 '22 19:08 haibol2016