shapeit5 icon indicating copy to clipboard operation
shapeit5 copied to clipboard

Assertion `GRvar_genotypes[vr][tidx].prob >= 0.0f' failed, when phasing rare alleles

Open TanguyLallemand opened this issue 2 years ago • 10 comments

Hello, @odelaneau ,

Thanks for the great work with shapeit5!!

I'm looking into implementing it in a nextflow pipeline (from modules made by @LouisLeNezet, thanks also to him!). I have a problem when phasing rare variants with shapeit5 either 5.0 or 5.1. Indeed the HMM computations are not done. Do you have any idea why this might be?

`` [SHAPEIT5] Phase_rare (phase rare variants onto a haplotype scaffold)

  • Authors : Simone RUBINACCI & Olivier DELANEAU, University of Lausanne
  • Contact : [email protected] & [email protected]
  • Version : 5.1.0 / commit = e4f5e58 / release = 2023-03-29
  • Run date : 04/04/2023 - 13:02:34

Files:

  • Unphased data : [split.chr18.vcf.gz]
  • Scaffold data : [chr18.ref.common.phased.vcf.gz]
  • Output data : [chr18_11000001_12000000.vcf.gz]

Parameters:

  • Seed : 15052011
  • Threads : 8 threads
  • PBWT : [depth = 2,2 / modulo = 0.1 / mac = 2 / mdr = 0.3]
  • HMM : [Ne = 15000 / Constant recombination rate of 1cM per Mb]

Parsing specified genomic regions

  • Scaffold region [chr18:10500001-12500000]
  • Input region [chr18:11000001-12000000]

Reading genotype data

  • BCF scanning ... [W::hts_idx_load3] The index file is older than the data file: split.chr18.vcf.gz.tbi [W::hts_idx_load3] The index file is older than the data file: chr18.ref.common.phased.vcf.gz.tbi
  • BCF scanning (35.15s)
    • Variant sites: [#scaffold=37997 | #rare=13362 | #all=51359]
    • 21696 rare variants in buffer regions excluded
    • 13362 variants found in both files [priority to scaffold]
    • 5265 multi-allelic variants excluded
  • HAP allocation [#scaffold=13362 / #samples=1842] (0.00s)
  • Genotype set allocation [#rare=37997 / #samples=1842] (0.00s)
  • BCF parsing ...
  • BCF parsing [100%]
  • Plain VCF/BCF parsing (36.04s)
    • Scaffold : [0/0=23191904 0/1=821979 1/1=598921]
    • Rare : [0/0=65436271 0/1=1925356 1/1=1326764 ./.=1302083]

Initializing sparse genotypes

  • 0 missing genotypes imputed at monomorphic sites
  • Genotype set transpose V2I (0.05s)

Setting up genetic map

  • Region length [1499962 bp / 1.5 cM (assuming 1cM per Mb)]
  • HMM parameters [Ne=15000 / Error=0.0001]
  • V2H transpose (0.03s)

PBWT pass

  • PBWT initialization [#eval=10283 / #select=6] (0.00s)

  • IBD2 constraints [100%]

  • IBD2 constraints [#inds=0 / #pairs=0] (0.42s)

  • PBWT forward selection [100%]

  • PBWT forward selection (2.65s)

  • PBWT backward selection [100%]

  • PBWT backward selection (2.87s)

    • #states=229.97+/-88.68
    • #collisions = 17789 / #pushes = 70642 / rate = 79.88%

HMM computations

  • Processing [0%] phase_rare_static: src/containers/genotype_set/genotype_set_phasing.cpp:63: void genotype_set::phaseLiAndStephens(unsigned int, unsigned int, aligned_vector32&, aligned_vector32&, std::vector&, float): Assertion `GRvar_genotypes[vr][tidx].prob >= 0.0f' failed. ``

TanguyLallemand avatar Apr 04 '23 13:04 TanguyLallemand

Hi,

Can you also send the command line you run, please? For both phase_common and phase_rare?

Best,

Olivier Delaneau

Le mar. 4 avr. 2023 à 15:11, Tanguy Lallemand @.***> a écrit :

Hello, @odelaneau https://github.com/odelaneau ,

Thanks for the great work with shapeit5!!

I'm looking into implementing it in a nextflow pipeline (from modules made by @LouisLeNezet https://github.com/LouisLeNezet, thanks also to him!). I have a problem when phasing rare variants with shapeit5 either 5.0 or 5.1. Indeed the HMM computations are not done. Do you have any idea why this might be?

`` [SHAPEIT5] Phase_rare (phase rare variants onto a haplotype scaffold)

Files:

  • Unphased data : [split.chr18.vcf.gz]
  • Scaffold data : [chr18.ref.common.phased.vcf.gz]
  • Output data : [chr18_11000001_12000000.vcf.gz]

Parameters:

  • Seed : 15052011
  • Threads : 8 threads
  • PBWT : [depth = 2,2 / modulo = 0.1 / mac = 2 / mdr = 0.3]
  • HMM : [Ne = 15000 / Constant recombination rate of 1cM per Mb]

Parsing specified genomic regions

  • Scaffold region [chr18:10500001-12500000]
  • Input region [chr18:11000001-12000000]

Reading genotype data

  • BCF scanning ... [W::hts_idx_load3] The index file is older than the data file: split.chr18.vcf.gz.tbi [W::hts_idx_load3] The index file is older than the data file: chr18.ref.common.phased.vcf.gz.tbi
  • BCF scanning (35.15s)
    • Variant sites: [#scaffold=37997 | #rare=13362 | #all=51359]
    • 21696 rare variants in buffer regions excluded
    • 13362 variants found in both files [priority to scaffold]
    • 5265 multi-allelic variants excluded
  • HAP allocation [#scaffold=13362 / #samples=1842] (0.00s)
  • Genotype set allocation [#rare=37997 / #samples=1842] (0.00s)
  • BCF parsing ...
  • BCF parsing [100%]
  • Plain VCF/BCF parsing (36.04s)
    • Scaffold : [0/0=23191904 0/1=821979 1/1=598921]
    • Rare : [0/0=65436271 0/1=1925356 1/1=1326764 ./.=1302083]

Initializing sparse genotypes

  • 0 missing genotypes imputed at monomorphic sites
  • Genotype set transpose V2I (0.05s)

Setting up genetic map

  • Region length [1499962 bp / 1.5 cM (assuming 1cM per Mb)]
  • HMM parameters [Ne=15000 / Error=0.0001]
  • V2H transpose (0.03s)

PBWT pass

PBWT initialization [#eval=10283 / #select=6] (0.00s)

IBD2 constraints [100%]

IBD2 constraints [#inds=0 / #pairs=0] (0.42s)

PBWT forward selection [100%]

PBWT forward selection (2.65s)

PBWT backward selection [100%]

PBWT backward selection (2.87s)

  • #states=229.97+/-88.68
    • #collisions = 17789 / #pushes = 70642 / rate = 79.88%

HMM computations

  • Processing [0%] phase_rare_static: src/containers/genotype_set/genotype_set_phasing.cpp:63: void genotype_set::phaseLiAndStephens(unsigned int, unsigned int, aligned_vector32&, aligned_vector32&, std::vector&, float): Assertion `GRvar_genotypes[vr][tidx].prob >= 0.0f' failed. ``

— Reply to this email directly, view it on GitHub https://github.com/odelaneau/shapeit5/issues/20, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD4XTIO4X66UFKGXP6MVTVLW7QMY5ANCNFSM6AAAAAAWSWIMTE . You are receiving this because you were mentioned.Message ID: @.***>

odelaneau avatar Apr 06 '23 07:04 odelaneau

Hi,

Yes, sure here they are.

phase_common_static
--input split.chr18.full_pop.vcf.gz
--region chr18:1-4000000
--thread 8
--filter-maf 0.001
--output-format bcf
--output chr18_1_1000000.bcf

Between this two line i have ligated all file produced with phase common, producing chr18.ref.common.phased.vcf.gz.

phase_rare_static
--input split.chr18.full_pop.vcf.gz
--scaffold chr18.ref.common.phased.vcf.gz
--input-region chr18:1-1000000
--scaffold-region chr18:1-1500000
--thread 8
--output chr18_1_1000000.vcf.gz

I am running those files in a docker container based on your released one.

Regards,

T

TanguyLallemand avatar Apr 07 '23 13:04 TanguyLallemand

Sorry, I come back to this quite late. Is this problem still happening even with last version.

Two comments anyway:

  1. Try increasing your chunk sizes for phase_Rare.
  2. Check if there is nothing wrong with your indexes as you have this warning showing up:

[W::hts_idx_load3] The index file is older than the data file: split.chr18.vcf.gz.tbi [W::hts_idx_load3] The index file is older than the data file: chr18.ref.common.phased.vcf.gz.tbi_

odelaneau avatar May 08 '23 14:05 odelaneau

I am having the same problem with phase_rare version 5.1.1 / commit = 990ed0d / release = 2023-05-08. These are the commands I have used:

phase_common \
  --thread 4 \
  --input "143_HESC_Genome.as.12.qc.bcf" \
  --reference "ALL.chr4.phase3_integrated.20130502.genotypes.bcf" \
  --map genetic_map.txt \
  --region 4:135951881-191154276 \
  --filter-maf 0.001 \
  --output "143_HESC_Genome.as.12.qc.scaffold.bcf"
phase_rare \
  --thread 4 \
  --input "143_HESC_Genome.as.12.qc.bcf" \
  --scaffold "143_HESC_Genome.as.12.qc.scaffold.bcf" \
  --map genetic_map.txt \
  --input-region 4:138482819-191154276 \
  --scaffold-region 4:135951881-191154276 \
  --output "143_HESC_Genome.as.12.qc.pgt.bcf"

After phase_rare outputs HMM computations I get the error:

phase_rare: src/containers/genotype_set/genotype_set_phasing.cpp:63: void genotype_set::phaseLiAndStephens(unsigned int, unsigned int, aligned_vector32<float>&, aligned_vector32<float>&, std::vector<unsigned int>&, float): Assertion `GRvar_genotypes[vr][tidx].prob >= 0.0f' failed.

I can share via email this test case if you want to reproduce the issue.

freeseek avatar Jun 22 '23 17:06 freeseek

Thanks for SHAPEIT. It is a great tool. I'm having the same issue described here. Same version as above: v5.1.1 - 990ed0d. I have gone back and increased the chunk size so there are >340k biallelic SNPs, removed variants with excess (>10%) missingness and excess heterozygosity, removed samples with excess missingness and outliers for heterozygosity, and changed the random seed. phase_common_static runs, but I get the same error each time for phase_rare_static at the HMM computations step:

phase_rare_static: src/containers/genotype_set/genotype_set_phasing.cpp:63: void genotype_set::phaseLiAndStephens(unsigned int, unsigned int, aligned_vector32<float>&, aligned_vector32<float>&, std::vector<unsigned int>&, float): Assertion `GRvar_genotypes[vr][tidx].prob >= 0.0f' failed.

Commands used:

phase_common_static \
  --input chrom7_array_exome_combined_qcFiltered.vcf.gz \
  --filter-maf 0.001 \
  --region 7 \
  --map chr7.b38.gmap.gz \
  --output chrom7_array_exome_combined_qcFiltered_phased.bcf \
  --thread 64 \
  --log chrom7_array_exome_combined_qcFiltered_phased.log  

phase_rare_static \
  --input chrom7_array_exome_combined_qcFiltered.vcf.gz \
  --scaffold chrom7_array_exome_combined_qcFiltered_phased.bcf \
  --map chr7.b38.gmap.gz \
  --output chrom7_array_exome_combined_qcFiltered_phased_rare.bcf \
  --log chrom7_array_exome_combined_qcFiltered_phased_rare.log \
  --scaffold-region "7:100096734-131496019" \
  --input-region "7:101096734-130496019" \
  --thread 64 \
  --seed 47 

rebeccaito avatar Aug 10 '23 14:08 rebeccaito

Any news on this error? I'm also getting this error, but for some chunks only. What do you think might be the source of the error? Could it be some weird variants/format in the chunk that is being processed?

MoiColl avatar Sep 14 '23 12:09 MoiColl

Hi all!

I was getting the same error when trying to phase rare variants. Actually, I was getting two flavors of the same error:

Assertion GRvar_genotypes[vr][tidx].prob >= 0.0f failed. or Assertion !isnan(GRvar_genotypes[vr][tidx].prob) failed.

I managed to fix it by polishing the vcf. In particular, I updated the INFO tags (with bcftools +fill-tags). Perhaps, shapeit requires proper values in the AF field or something else.

It would be nice if the upcoming versions of shapeit5 were a bit more explicit about this kind of fails or at least the requirements for the vcf were outlined in the documentation :).

vasilipankratov avatar Dec 07 '23 09:12 vasilipankratov

I was having the same issue for some of the chunks when using phase rare. I fixed it by increasing the chunk size.

Best

jalghor avatar Jan 13 '24 14:01 jalghor

I was having the same issue for some of the chunks when using phase rare. I fixed it by increasing the chunk size.

Best

@jalghor mean you generated the chunks for rare variants with more than --window-cm arg (=4) by GLIMPSE2_chunk or referring the seed size --seed arg (=15052011) Seed of the random number generator in phase_rare. Can you provide the value you used to overcome the failure? In my phase_common output AF is missing in the INFO field, same as @vasilipankratov mentioned. Wonder which one would fix the error or which one to consider if the results vary! Thanks

justinjj24 avatar Feb 19 '24 05:02 justinjj24

Hi all! Are there any news on this error? I have tried increasing the chunk size, but still got the error. However, independent of chunk size, I am getting the error only for some chunks, so I think in my case the chunk size is not the issue (generated chunks with glimpse2 as suggested). I also tried bcftools +fill-tags and VCF format is fine. I would appreciate any update. Thanks!

JuliaMarkowski avatar Sep 16 '24 20:09 JuliaMarkowski