score icon indicating copy to clipboard operation
score copied to clipboard

Warning: as option --src-fasta-ref is missing it is impossible to infer which allele is the reference allele

Open bbimber opened this issue 1 year ago • 9 comments

Hello,

I am trying to run liftover on a moderate size (500K PacBio-derived SVs) to lift data from the macaque genome to GRCh37. I am running the command below, but it is reporting:

Warning: as option --src-fasta-ref is missing it is impossible to infer which allele is the reference allele at position 1:248821847

however, I think I am providing "-s" to the command. Is there a different argument I am missing, or is the tool expecting some other kind of input? Thanks for any ideas.

	bcftools +liftover \
		--no-version \
		-Ou \
		--threads 18 \
		-o $VCF_LIFTED \
		$VCF_NORM \
		-- \
		-s <MMul10_Genome_Fasta> \
		-f <GRCH37_FASTA> \
		-c $CHAIN_FILE \
		--reject $UNMAPPED \
		--reject-type z \
		--write-src \
		--fix-tags

Also, I would not have thought 500K SVs is that big a dataset, but this has been running for days (even with 18 threads), which seems rather extreme.

bbimber avatar Jun 05 '24 22:06 bbimber

This could be a mistake in the plugin in identifying which variants are symbolic variants. Can you share with me an example that reproduces the issue? Also, using multiple threads will only affect compression/decompression so most likely you don't need to use that many threads. The liftover step is not multi-threaded. It should run in a few seconds. If it is taking so long it means you found a bug in the code

freeseek avatar Jun 10 '24 18:06 freeseek

@freeseek: If you're willing to have a look, i'm happy to share any of this. the input file is ~200mb; however, it repros the warning quite quickly. It also basically hangs after starting the tool, with no obvious work happing (nothing is being written). If I posted the file would you consider this, or do you want a more minimal input?

bbimber avatar Jun 13 '24 18:06 bbimber

That's perfect. If you could send me by email a link to the VCF and a link to the MMul10_Genome_Fasta file that would be great

freeseek avatar Jun 13 '24 19:06 freeseek

@freeseek I am running into the same problem. Were you able to fix the problem or did you find out how to solve it? bcftools +liftover --no-version -Ou GWAS.bcf -- \ -s human_g1k_v37.fasta \ -f GCA_000001405.15_GRCh38_no_alt_analysis_set.fna \ -c hg19ToHg38.over.chain.gz \ --reject GWAS_unlifted.bcf \ --reject-type b \ --write-src | \ bcftools sort -o GWAS_Hg38.bcf -Ou --write-index Thanks in advance for any help!

PhBeeken avatar Apr 01 '25 17:04 PhBeeken

Can you try the development version here to see if it fixes your problem?

freeseek avatar Apr 01 '25 21:04 freeseek

I have tried the development version, but the issue still persists. I checked my VCF file, and although it does not include a SNP at position 779047, I still receive the warning: Warning: as option --src-fasta-ref is missing it is impossible to infer which allele is the reference allele at position chr1:779047. The same warning also appears for many other positions that are not present in my VCF file.

PhBeeken avatar Apr 02 '25 10:04 PhBeeken

Position chr1:779047 would be the position after the liftover. Is it possible that you have variants without any alternate alleles? I would like to be able to reproduce this error

freeseek avatar Apr 02 '25 20:04 freeseek

@freeseek I have sent you an email with some of the positions that are causing the issue.

PhBeeken avatar Apr 03 '25 15:04 PhBeeken

I also had reported this issue #25

jjfarrell avatar Apr 24 '25 13:04 jjfarrell