graphtyper icon indicating copy to clipboard operation
graphtyper copied to clipboard

Fatal error "FAI index has no entry" for GRCh38 HLA alt contig

Open seboyden opened this issue 3 years ago • 6 comments

Running graphtyper on GRCh38 with alt contigs, I get the following fatal error on chr6:

<error> [constructor.cpp:159] FAI index has no entry for contig/chromosome 'HLA-DRB1*13'

Both the FASTA file and its FAI index actually contain these contigs: HLA-DRB1*13:01:01 HLA-DRB1*13:02:01

Is graphtyper truncating the contig name at the colon?

seboyden avatar Aug 06 '20 17:08 seboyden

Hello,

The region is expected to be of the format <contig>[:<begin>[-<end>]] so the colon in the contig name is most likely mistakenly taken as a delimiter. The behaviour could be changed such that the last colon is used as a delimiter instead of the first but then you would to specify a begin position, i.e. graphtyper genotype ... --region=HLA-DRB1*13:01:01:1 if you want to genotype the whole contig. Would that be a sufficient fix for you?

Best, Hannes

hannespetur avatar Aug 07 '20 10:08 hannespetur

To clarify, I did not include the HLA alt contigs in my regions file, I only included the primary contigs (chr1-chrX,chrY,chrM). The error occurred while processing chr6 (the HLA locus is on chr6), apparently because graphtyper was re-aligning reads from chr6 to the HLA alt contig. More context for the error:

[2020-08-05 21:49:22.632379] <debug> constructor.cpp:1603 Constructing graph for region chr6:31999001-33201000
[2020-08-05 21:49:22.632437] <debug> constructor.cpp:1605 Reading FASTA file located at human_g1k_v38_decoy_phix.fasta
[2020-08-05 21:49:22.642624] <debug> [constructor.cpp:1640] Reading VCF file located at 1000G_manta.diploidSV.svimmer.vcf.gz
[2020-08-05 21:49:22.653146] <debug> [constructor.cpp:1172] Transformed an SV deletion @ 32345991 with size diff -323
[2020-08-05 21:49:22.653238] <debug> [constructor.cpp:1514] A deletion of size 323 @ 32345991
[2020-08-05 21:49:22.653318] <debug> [constructor.cpp:1172] Transformed an SV deletion @ 32410069 with size diff -796
[2020-08-05 21:49:22.653382] <debug> [constructor.cpp:1514] A deletion of size 796 @ 32410069
[2020-08-05 21:49:22.653471] <debug> [constructor.cpp:1172] Transformed an SV deletion @ 32449157 with size diff -102
[2020-08-05 21:49:22.653534] <debug> [constructor.cpp:1514] A deletion of size 102 @ 32449157
[2020-08-05 21:49:22.653608] <debug> [constructor.cpp:1502] A breakend ]chr6_GL000255v2_alt:3717535]T @ 32518892
[2020-08-05 21:49:22.653676] <debug> [constructor.cpp:448] BND variant case 3 @ 32518892
[2020-08-05 21:49:22.653746] <debug> [constructor.cpp:1502] A breakend ]chr6_GL000251v2_alt:3938616]T @ 32521484
[2020-08-05 21:49:22.653819] <debug> [constructor.cpp:448] BND variant case 3 @ 32521484
[2020-08-05 21:49:22.653886] <debug> [constructor.cpp:1502] A breakend ]chr6_GL000255v2_alt:3722273]T @ 32523578
[2020-08-05 21:49:22.653947] <debug> [constructor.cpp:448] BND variant case 3 @ 32523578
[2020-08-05 21:49:22.654015] <debug> [constructor.cpp:1502] A breakend A]HLA-DRB1*13:01:01:4698] @ 32526272
[2020-08-05 21:49:22.654076] <error> [constructor.cpp:159] FAI index has no entry for contig/chromosome 'HLA-DRB1*13'

seboyden avatar Aug 07 '20 16:08 seboyden

Oh I see. That makes things a little simpler to fix.

hannespetur avatar Aug 07 '20 16:08 hannespetur

Note that the HLA alt contigs with colons in the names are specific to the Broad's GATK bundle version of the hg38 reference, they are not found in the NCBI/UCSC analysis set versions: https://gatk.broadinstitute.org/hc/en-us/articles/360035890811-Resource-bundle

Web browser:

ftp://[email protected]/bundle/

FTP client:

location: ftp.broadinstitute.org/bundle
username: gsapubftp-anonymous
password:

seboyden avatar Aug 11 '20 19:08 seboyden

Update-- I tried re-running graphtyper providing a different reference FASTA file without HLA alt contigs. However, I got a similar error referring to a missing HLA contig in the FAI index, while processing chr6. It appears the problem is where Manta called a BND between chr6 and an HLA alt contig in the reference the input crams were originally aligned to (so I can't avoid the problem at graphtyper just by specifying a different reference). Graphtyper may not be parsing the full HLA contig name out of the ALT field in the svimmer VCF.

Svimmer VCF:

chr6    29942357        .       A       A[HLA-A*02:89:103[      0       .       SVTYPE=BND;MATEID=MantaBND:

Graphtyper stderr:

[2020-08-11 22:39:52.763248] <info> SV genotyping region chr6:29000001-30000000
[2020-08-11 22:39:52.763310] <info> Path to genome is 'GCA_000001405.15_GRCh38_full_plus_hs38d1_analysis_set.fna'
...
[2020-08-11 22:39:52.763682] <info> Padded region is: chr6:28999001-30201000
[2020-08-11 22:39:52.763741] <info> Constructing graph.
...
[2020-08-11 22:39:52.818587] <debug> [constructor.cpp:1502] A breakend A[HLA-A*02:89:103[ @ 29942357
[2020-08-11 22:39:52.818647] <error> [constructor.cpp:159] FAI index has no entry for contig/chromosome 'HLA-A*02'

seboyden avatar Aug 14 '20 19:08 seboyden

Thanks for the update. You will need to remove any HLA BND records to work around the issue for now. I hope I will have time to work on a fix soon though.

Best, Hannes

hannespetur avatar Aug 18 '20 08:08 hannespetur