popscle
popscle copied to clipboard
dsc-pileup running progressively slower for a single sample
Hi All,
Happy New Year! I am having a similar issue to the other active thread, regarding dsc-pileup running slow. It starts off fine, and for some reason just gets gradually more and more slow:
https://files.slack.com/files-pri/T02SU7LHA-FS739HWTE/image_from_ios.jpg
As you can see, at each line, the times in between are getting longer and longer. This example has just been running for a short period of time, but I tried running it last week and it was running for about 5 days without finishings. Sometimes there is about 5 hours in between lines! I'm not sure why it is getting progressively slower as the program runs. Just as an fyi, I have run demuxlet on the same linux server on all the same samples in the past with no issues. I am trying to run dsc-pileup now so that I can run freemuxlet after. Many thanks in advance for all your help.
Some additional info: running on a Linux server with 64 GB RAM. And this is my command:
~/popscle/bin/popscle dsc-pileup --sam /mnt/usb-storage/ihg-client.ucsf.edu/yej/190627_A00269_0205_BHJ7HFDMXX_fastqs_analysis/MS-TCZ-1_pool_1-1/possorted_genome_bam.bam --vcf ~/ALL_SingleCell_data/Tocilizumab/080919-TCZ-genotyping/ucsc.hg38.liftover.out.nochr.vcf --group-list /mnt/usb-storage/ihg-client.ucsf.edu/yej/190627_A00269_0205_BHJ7HFDMXX_fastqs_analysis/MS-TCZ-1_pool_1-1/filtered_feature_bc_matrix/barcodes.tsv --out /mnt/usb-storage/ihg-client.ucsf.edu/yej/190627_A00269_0205_BHJ7HFDMXX_fastqs_analysis/MS-TCZ-1_pool_1-1/pileup
I think it is hitting the memory limit and thrashing seems happening. I cannot see the image in the link to confirm. Can you send?
Hyun.
Hyun Min Kang, Ph.D. Associate Professor of Biostatistics University of Michigan, Ann Arbor Email : [email protected]
On Thu, Jan 2, 2020 at 1:23 PM xAZx [email protected] wrote:
Some additional info: running on a Linux server with 64 GB RAM. And this is my command:
~/popscle/bin/popscle dsc-pileup --sam /mnt/usb-storage/ ihg-client.ucsf.edu/yej/190627_A00269_0205_BHJ7HFDMXX_fastqs_analysis/MS-TCZ-1_pool_1-1/possorted_genome_bam.bam --vcf ~/ALL_SingleCell_data/Tocilizumab/080919-TCZ-genotyping/ucsc.hg38.liftover.out.nochr.vcf --group-list /mnt/usb-storage/ ihg-client.ucsf.edu/yej/190627_A00269_0205_BHJ7HFDMXX_fastqs_analysis/MS-TCZ-1_pool_1-1/filtered_feature_bc_matrix/barcodes.tsv --out /mnt/usb-storage/ ihg-client.ucsf.edu/yej/190627_A00269_0205_BHJ7HFDMXX_fastqs_analysis/MS-TCZ-1_pool_1-1/pileup
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/statgen/popscle/issues/22?email_source=notifications&email_token=ABPY5OK7SITA3OBF6FOZFWTQ3YWJ7A5CNFSM4KCFGDB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEH7AJ2Y#issuecomment-570295531, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABPY5OOAECMCGTHPDYX7DWLQ3YWJ7ANCNFSM4KCFGDBQ .
for some reason I am having trouble getting a link to work so I will send it to your email
@xAZx I might have a solution for your problem.
In my case dsc-pileup
was very slow (took for some test samples 200 hours).
So I made the following: https://github.com/aertslab/popscle_helper_tools
which let me run dsc-pileup
on the filtered BAM file in only 20 minutes.
$ ./filter_bam_file_for_popscle_dsc_pileup.sh
Usage: filter_bam_file_for_popscle_dsc_pileup input_bam_filename barcodes_tsv_filename vcf_filename output_bam_filename
Purpose: Filter BAM file for usage with dsc-pileup of popscle by keeping reads that:
- overlap with SNPs in the VCF file
- and have a cell barcode contained in the cell barcode list
Keeping only relevant reads for dsc-pileup can speedup it up several hunderd times.
So for your sample, the following should work.
# Create filtered BAM with only the reads dsc-pileup needs.
./filter_bam_file_for_popscle_dsc_pileup.sh \
/mnt/usb-storage/ihg-client.ucsf.edu/yej/190627_A00269_0205_BHJ7HFDMXX_fastqs_analysis/MS-TCZ-1_pool_1-1/possorted_genome_bam.bam \
/mnt/usb-storage/ihg-client.ucsf.edu/yej/190627_A00269_0205_BHJ7HFDMXX_fastqs_analysis/MS-TCZ-1_pool_1-1/filtered_feature_bc_matrix/barcodes.tsv \
~/ALL_SingleCell_data/Tocilizumab/080919-TCZ-genotyping/ucsc.hg38.liftover.out.nochr.vcf \
/tmp/MS-TCZ-1_pool_1-1.filter_bam_file_for_popscle_dsc_pileup.bam
# Use filtered BAM file for dsc-pileup.
~/popscle/bin/popscle dsc-pileup \
--sam /tmp/MS-TCZ-1_pool_1-1.filter_bam_file_for_popscle_dsc_pileup.bam \
--vcf ~/ALL_SingleCell_data/Tocilizumab/080919-TCZ-genotyping/ucsc.hg38.liftover.out.nochr.vcf \
--group-list /mnt/usb-storage/ihg-client.ucsf.edu/yej/190627_A00269_0205_BHJ7HFDMXX_fastqs_analysis/MS-TCZ-1_pool_1-1/filtered_feature_bc_matrix/barcodes.tsv \
--out /mnt/usb-storage/ihg-client.ucsf.edu/yej/190627_A00269_0205_BHJ7HFDMXX_fastqs_analysis/MS-TCZ-1_pool_1-1/pileup
This is a really cool tool! thanks @ghuls . It works well with a VCF file containing SNPs from the 1000GP but it doesn't work with another that comes from microarray data. the error message:
Error: Sorted input specified, but the file out.hg38.vcf has the following out of order record chr1 121275027 JHU_1.120748309 G . . . PR GT 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0
Any ideas of what could be happening?