Minimac4 icon indicating copy to clipboard operation
Minimac4 copied to clipboard

Multithreading during M3VCF/MSAV Generation?

Open mragsac opened this issue 1 year ago • 3 comments

I am trying to generate a custom reference with Minimac3 (M3VCF) and Minimac4 (MSAV) and was wondering if the operations to do so can be enabled to be/are possibly multithreaded?

Commands to Generate Reference Files

# Minimac3
Minimac3 --refHaps chr${chr}.vcf.gz --processReference --prefix m3vcfs/chr${chr} --myChromosome {chr_prefix} --rsid

# Minimac4
minimac4 --compress-reference reference.{sav,bcf,vcf.gz} > reference.msav

When I try using the --cpus flag, it doesn't seem like the CPUs I have available are being used when I'm checking on things with htop...

mragsac avatar Jul 28 '23 20:07 mragsac

Multithread option takes effect for imputation only.

On Fri, Jul 28, 2023 at 4:06 PM Michelle Franc Ragsac, Ph.D. < @.***> wrote:

I am trying to generate a custom reference with Minimac3 (M3VCF) and Minimac4 (MSAV) and was wondering if the operations to do so can be enabled to be/are possibly multithreaded?

Commands to Generate Reference Files

Minimac3

Minimac3 --refHaps chr${chr}.vcf.gz --processReference --prefix m3vcfs/chr${chr} --myChromosome {chr_prefix} --rsid

Minimac4

minimac4 --compress-reference reference.{sav,bcf,vcf.gz} > reference.msav

When I try using the --cpus flag, it doesn't seem like the CPUs I have available are being used when I'm checking on things with htop...

— Reply to this email directly, view it on GitHub https://github.com/statgen/Minimac4/issues/62, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD6UVLILH4PH4BE7YP74QJDXSQLTFANCNFSM6AAAAAA235XCDY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Ketian Yu, M.S. PhD candidate | Department of Biostatistics University of Michigan, Ann Arbor MI She | Her | Hers

yukt avatar Jul 28 '23 20:07 yukt

Thank you for your speedy reply!!

Is it expected that a single chromosome from an imputation panel would take multiple days to compress to the M3VCF or MSAV format? I'm trying to understand if there are issues on my end in running things or if this is expected behavior ...

mragsac avatar Jul 28 '23 20:07 mragsac

Yes, It can take a long time for large reference panels. With Minimac4, you can speed up the compression by using multiple processes (instead of threads) and then concatenating the chunks:

bcftools view  chr1.vcf.gz -Ou -r chr1:1-10000000 -i 'POS>=1' | minimac4 --compress-reference /dev/stdin > chr1_1_10000000.msav
bcftools view  chr1.vcf.gz -Ou -r chr1:10000001-20000000 -i 'POS>=10000001' | minimac4 --compress-reference /dev/stdin > chr1_10000001_20000000.msav
 ...
sav concat $( ls chr1_*.msav | sort -V ) -o chr1.msav

I don't know for sure whether this approach is possible for minimac3.

bcftools: https://github.com/samtools/bcftools sav: https://github.com/statgen/savvy/releases/download/v2.1.0/savvy-2.1.0-Linux-x86_64-cli.sh

jonathonl avatar Jul 28 '23 20:07 jonathonl