Warning: as option --src-fasta-ref is missing it is impossible to infer which allele is the reference allele
Hello,
I am trying to run liftover on a moderate size (500K PacBio-derived SVs) to lift data from the macaque genome to GRCh37. I am running the command below, but it is reporting:
Warning: as option --src-fasta-ref is missing it is impossible to infer which allele is the reference allele at position 1:248821847
however, I think I am providing "-s" to the command. Is there a different argument I am missing, or is the tool expecting some other kind of input? Thanks for any ideas.
bcftools +liftover \
--no-version \
-Ou \
--threads 18 \
-o $VCF_LIFTED \
$VCF_NORM \
-- \
-s <MMul10_Genome_Fasta> \
-f <GRCH37_FASTA> \
-c $CHAIN_FILE \
--reject $UNMAPPED \
--reject-type z \
--write-src \
--fix-tags
Also, I would not have thought 500K SVs is that big a dataset, but this has been running for days (even with 18 threads), which seems rather extreme.
This could be a mistake in the plugin in identifying which variants are symbolic variants. Can you share with me an example that reproduces the issue? Also, using multiple threads will only affect compression/decompression so most likely you don't need to use that many threads. The liftover step is not multi-threaded. It should run in a few seconds. If it is taking so long it means you found a bug in the code
@freeseek: If you're willing to have a look, i'm happy to share any of this. the input file is ~200mb; however, it repros the warning quite quickly. It also basically hangs after starting the tool, with no obvious work happing (nothing is being written). If I posted the file would you consider this, or do you want a more minimal input?
That's perfect. If you could send me by email a link to the VCF and a link to the MMul10_Genome_Fasta file that would be great
@freeseek I am running into the same problem. Were you able to fix the problem or did you find out how to solve it?
bcftools +liftover --no-version -Ou GWAS.bcf -- \ -s human_g1k_v37.fasta \ -f GCA_000001405.15_GRCh38_no_alt_analysis_set.fna \ -c hg19ToHg38.over.chain.gz \ --reject GWAS_unlifted.bcf \ --reject-type b \ --write-src | \ bcftools sort -o GWAS_Hg38.bcf -Ou --write-index
Thanks in advance for any help!
Can you try the development version here to see if it fixes your problem?
I have tried the development version, but the issue still persists.
I checked my VCF file, and although it does not include a SNP at position 779047, I still receive the warning:
Warning: as option --src-fasta-ref is missing it is impossible to infer which allele is the reference allele at position chr1:779047. The same warning also appears for many other positions that are not present in my VCF file.
Position chr1:779047 would be the position after the liftover. Is it possible that you have variants without any alternate alleles? I would like to be able to reproduce this error
@freeseek I have sent you an email with some of the positions that are causing the issue.
I also had reported this issue #25