SCExecute's Capability to Split Pooled BAM Files into Single-Cell BAMs
Hello SCExecute Developers,
I'm currently exploring the functionality of SCExecute and I have a question regarding its capability to split pooled BAM files into individual BAM files for each single cell. Could you please clarify whether SCExecute has built-in functionality for this task, or if users need to pre-process their BAM files using tools like Samtools before using SCExecute?
Additionally, if SCExecute is indeed capable of splitting pooled BAM files into single-cell BAMs using a list of barcodes, could you kindly provide the exact command that I should use with SCExecute for this purpose?
Thank you for your assistance.
Hi @Giovanna0806! Thank you for your interest in SCExecute.
You will need a BAM file produced by STARsolo or CellRanger when it aligns your sequences against the reference, which also provides BAM headers for the cell-barcode. Alternatively, if you have a BAM file without these STARsolo/CellRanger annotations, you could use UMITools to label the aligned reads with their barcodes for SCExecute to use. Furthermore, your BAM file should be indexed (*.bam.bai present).
Assuming STARsolo aligned BAM files with cell-barcode headers added (see the STARsolo documentation):
scExecute --readalignments <bam_file>.bam
--cellbarcode=STARsolo
--filetemplate "{BAMBASE}.{BARCODE}.bam"
If you have a list of barcodes (one per line, no header) in a file that you'd like to use:
scExecute --readalignments <bam_file>.bam
--cellbarcode=STARsolo
--barcode_acceptlist <barcodes.txt>
--filetemplate "{BAMBASE}.{BARCODE}.bam"
Hope this helps! Let me know if you have trouble getting it to work!
@Giovanna0806: Pasting the email reply in here so I don't lose track of it
Thank you for the help! I confirm that I have the .BAM file generated by aligning sequences using the STARsolo module, as recommended in scExecute documentation.
However, I have some questions regarding the procedure for variant calling using HaplotypeCaller. Considering the option to perform both the splitting of files into single cells and the variant calling in a single step, I would like to confirm if the following command would be the correct approach:
$scexecute_path --readalignments <bam_file>.bam
--cellbarcode=STARsolo
--barcode_acceptlist <barcodes.txt>
--filetemplate "{BAMBASE}.{BARCODE}.bam"
--command "gatk --java-options "-Xmx8G -DGATK_STACKTRACE_ON_USER_EXCEPTION=true" HaplotypeCaller
-R $reference_fasta
-I {BAMFILE}
-O {OUTPUT_VCF}.gz
-ERC GVCF"
Thank you for providing such a valuable tool! Looking forward to your help on this matter.
Unless you want to keep the cell-specific bamfiles around, you can omit the --filetemplate argument. I suggest you create a shell script to capture the details of how you want gatk to be executed. It should take the cell-specific BAM file as an arguement and (potentially) the output filename (if you don't want to determine the output file in the script itself). The script should be written to ensure multiple copies can be run at once.
$scexecute_path --readalignments <bam_file>.bam --cellbarcode=STARsolo --barcode_acceptlist <barcodes.txt> --command "$script_path/gatk_script.sh {CBPATH} {CBBASE}.vcf.gz"
Hope this helps...
Thank you, Nathan. It was very helpful and it worked just fine for me.