NGS icon indicating copy to clipboard operation
NGS copied to clipboard

SCExecute's Capability to Split Pooled BAM Files into Single-Cell BAMs

Open Giovanna0806 opened this issue 1 year ago • 3 comments

Hello SCExecute Developers,

I'm currently exploring the functionality of SCExecute and I have a question regarding its capability to split pooled BAM files into individual BAM files for each single cell. Could you please clarify whether SCExecute has built-in functionality for this task, or if users need to pre-process their BAM files using tools like Samtools before using SCExecute?

Additionally, if SCExecute is indeed capable of splitting pooled BAM files into single-cell BAMs using a list of barcodes, could you kindly provide the exact command that I should use with SCExecute for this purpose?

Thank you for your assistance.

Giovanna0806 avatar Apr 03 '24 00:04 Giovanna0806

Hi @Giovanna0806! Thank you for your interest in SCExecute.

You will need a BAM file produced by STARsolo or CellRanger when it aligns your sequences against the reference, which also provides BAM headers for the cell-barcode. Alternatively, if you have a BAM file without these STARsolo/CellRanger annotations, you could use UMITools to label the aligned reads with their barcodes for SCExecute to use. Furthermore, your BAM file should be indexed (*.bam.bai present).

Assuming STARsolo aligned BAM files with cell-barcode headers added (see the STARsolo documentation):

scExecute --readalignments <bam_file>.bam 
                 --cellbarcode=STARsolo 
                 --filetemplate "{BAMBASE}.{BARCODE}.bam"

If you have a list of barcodes (one per line, no header) in a file that you'd like to use:

scExecute --readalignments <bam_file>.bam 
                 --cellbarcode=STARsolo 
                 --barcode_acceptlist <barcodes.txt>
                 --filetemplate "{BAMBASE}.{BARCODE}.bam"

Hope this helps! Let me know if you have trouble getting it to work!

edwardsnj avatar Apr 03 '24 20:04 edwardsnj

@Giovanna0806: Pasting the email reply in here so I don't lose track of it

Thank you for the help! I confirm that I have the .BAM file generated by aligning sequences using the STARsolo module, as recommended in scExecute documentation.

However, I have some questions regarding the procedure for variant calling using HaplotypeCaller. Considering the option to perform both the splitting of files into single cells and the variant calling in a single step, I would like to confirm if the following command would be the correct approach:

$scexecute_path --readalignments <bam_file>.bam
--cellbarcode=STARsolo
--barcode_acceptlist <barcodes.txt>
--filetemplate "{BAMBASE}.{BARCODE}.bam"
--command "gatk --java-options "-Xmx8G -DGATK_STACKTRACE_ON_USER_EXCEPTION=true" HaplotypeCaller
-R $reference_fasta
-I {BAMFILE}
-O {OUTPUT_VCF}.gz
-ERC GVCF"

Thank you for providing such a valuable tool! Looking forward to your help on this matter.

edwardsnj avatar Apr 11 '24 16:04 edwardsnj

Unless you want to keep the cell-specific bamfiles around, you can omit the --filetemplate argument. I suggest you create a shell script to capture the details of how you want gatk to be executed. It should take the cell-specific BAM file as an arguement and (potentially) the output filename (if you don't want to determine the output file in the script itself). The script should be written to ensure multiple copies can be run at once.

$scexecute_path --readalignments <bam_file>.bam --cellbarcode=STARsolo --barcode_acceptlist <barcodes.txt> --command "$script_path/gatk_script.sh {CBPATH} {CBBASE}.vcf.gz"

Hope this helps...

edwardsnj avatar Apr 11 '24 16:04 edwardsnj

Thank you, Nathan. It was very helpful and it worked just fine for me.

Giovanna0806 avatar May 25 '24 22:05 Giovanna0806