biscuit icon indicating copy to clipboard operation
biscuit copied to clipboard

[fetch_refseq:40] Error, cannot retrieve reference chr1:0-0.

Open ttriche opened this issue 9 years ago • 16 comments

After marking dupes, the following

biscuit pileup -q 16 -r hg19.fa -i WGBS_$SAMPLE.mkdups.hg19.bam -o WGBS_$SAMPLE.mkdups.hg19.vcf

fails on a number of samples (4/6, not all though). Is there a straightforward way to debug this? Or just to grep out the offending read[s]? The runs that fail are all from the same flow cell IIRC. I don't see this behavior elsewhere and can't seem to find anything on Google. Eventually I'll look into the source... :-/

ttriche avatar Jan 29 '16 14:01 ttriche

Hi ttriche, Did you find any way to fix this error?

rbpisupati avatar May 25 '16 09:05 rbpisupati

Hi rahbz, are you seeing this? Could you elaborate on how to reproduce? I tried to reproduce this on my side but in vain. I must be missing something.

zwdzwd avatar May 25 '16 13:05 zwdzwd

Hi, I have run it with this command and it ended up giving me an error.

~/Softwares/biscuit-0.1.4.20160330/bin/biscuit pileup -r ~/whole_genome_TAIR10/TAIR10_all.fa -i 36841_GCCAAT_C92VAANXX_1_20160229B_20160229.modified.bam ##fileformat=VCFv4.1 ##reference=/home/GMI/rahul.pisupati/whole_genome_TAIR10/TAIR10_all.fa ##source=biscuitV0.1.3.20160324 ##contig=<ID=Chr1,length=30427671> ##contig=<ID=Chr2,length=19698289> ##contig=<ID=Chr3,length=23459830> ##contig=<ID=Chr4,length=18585056> ##contig=<ID=Chr5,length=26975502> ##contig=<ID=ChrC,length=154478> ##contig=<ID=ChrM,length=366924> ##program=<cmd=biscuit pileup -r /home/GMI/rahul.pisupati/whole_genome_TAIR10/TAIR10_all.fa -i 36841_GCCAAT_C92VAANXX_1_20160229B_20160229.modified.bam> ##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of samples with data"> ##INFO=<ID=CX,Number=1,Type=String,Description="Cytosine context (CG, CHH or CHG)"> ##INFO=<ID=N5,Number=1,Type=String,Description="5-nucleotide context, centered around target cytosine"> ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Raw read depth"> ##FORMAT=<ID=SP,Number=R,Type=String,Description="Allele support (before bisulfite conversion, with filtering)"> ##FORMAT=<ID=CV,Number=1,Type=Integer,Description="Effective (strand-specific) coverage on cytosine"> ##FORMAT=<ID=BT,Number=1,Type=Float,Description="Cytosine methylation fraction (aka beta value, with filtering)"> ##FORMAT=<ID=GT,Number=1,Type=Integer,Description="Genotype from normal"> ##FORMAT=<ID=GL,Number=G,Type=Integer,Description="Genotype likelihoods"> ##FORMAT=<ID=GQ,Number=1,Type=Float,Description="Genotype quality (phred-scaled)"> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 36841_GCCAAT_C92VAANXX_1_20160229B_20160229.modified [fetch_refseq:62] Error, cannot retrieve reference Chr1:1-0.

But I think I have figured out the error. If I change the reference genome modifying the header of fasta to just the Chromosome name instead of all the information I seemed to get no error. I dont know why that is the case. So @ttriche, I think you should modify your genome fasta headers to just the chromosome name.

Thanks, Rahul

rbpisupati avatar May 25 '16 13:05 rbpisupati

it disappeared once I updated biscuit and sorted the BAMs (verified sort order by indexing the result prior to bedgraph/vcf production)

--t

On Wed, May 25, 2016 at 6:12 AM, Wanding Zhou [email protected] wrote:

Hi rahbz, are you seeing this? Could you elaborate on how to reproduce? I tried to reproduce this on my side but in vain. I must be missing something.

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/zwdzwd/biscuit/issues/3#issuecomment-221571395

ttriche avatar May 25 '16 16:05 ttriche

re: headers: I just used hg19 from UCSC, so that reason seems unlikely

--t

On Wed, May 25, 2016 at 6:33 AM, Rahul Bharadwaj [email protected] wrote:

Hi, I have run it with this command and it ended up giving me an error. ~/Softwares/biscuit-0.1.4.20160330/bin/biscuit pileup -r ~/whole_genome_TAIR10/TAIR10_all.fa -i 36841_GCCAAT_C92VAANXX_1_20160229B_20160229.modified.bam ##fileformat=VCFv4.1 ##reference=/home/GMI/rahul.pisupati/whole_genome_TAIR10/TAIR10_all.fa ##source=biscuitV0.1.3.20160324 ##contig= ##contig= ##contig= ##contig= ##contig= ##contig= ##contig= ##program= ##INFO= ##INFO= ##INFO= ##FORMAT= ##FORMAT= ##FORMAT= ##FORMAT= ##FORMAT= ##FORMAT= ##FORMAT= #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 36841_GCCAAT_C92VAANXX_1_20160229B_20160229.modified [fetch_refseq:62] Error, cannot retrieve reference Chr1:1-0.

But I think I have figured out the error. If I change the reference genome modifying the header of fasta to just the Chromosome name instead of all the information I seemed to get no error. I dont know why that is the case. So @ttriche https://github.com/ttriche, I think you should modify your genome fasta headers to just the chromosome name.

Thanks, Rahul

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/zwdzwd/biscuit/issues/3#issuecomment-221577192

ttriche avatar May 25 '16 16:05 ttriche

Yeah even I thought the same, I have the same headers from Arabidopsis, tair10. Dont know really why that was the case.

rbpisupati avatar May 25 '16 16:05 rbpisupati

Hello,

I have the exact same problem even after cloning the latest version of biscuit and sorting/indexing my bam files. Did someone figure out what would be the issue ?

Thanks

jleluyer avatar Oct 23 '16 04:10 jleluyer

Hello jleluyer, try to rename headers in your reference fasta file with just chromosome names. it should work then!

rbpisupati avatar Oct 23 '16 13:10 rbpisupati

Hello,

They are just with scaffold names: i.e >sp_scaff00001. Is the way there are called a problem ? thanks

jleluyer avatar Oct 23 '16 14:10 jleluyer

I got similar error here "[fetch_refcache:94] Error, cannot retrieve reference" not chr1 though. It errors out in either HPV-mCG2 or HPV-mCG3 or both in the GDC reference genome https://gdc.cancer.gov/about-data/data-harmonization-and-generation/gdc-reference-files

Modification of fasta header to only keep the contig name did solve the problem

ZhenyuZ avatar Aug 29 '17 14:08 ZhenyuZ

Hello, I'm getting a similar error: Command: biscuit pileup -g "chr6:1-10000" -o out.vcf GRCh38_p12_chr6.fa input_1_sorted.bam input_2_sorted.bam input_3_sorted.bam input_4_sorted.bam

Error: [refcache_fetch:85] Error, cannot retrieve reference chr6:0-0

The reference fasta I'm using only contains the GRCh38 assembled chromosome 6. I changed the header from ">CM000668.2 Homo sapiens chromosome 6, GRCh38 reference primary assembly" to just ">chr6" and it still does not resolve the problem.

EDIT: I've figured out the issue. I didn't index the reference fasta again after changing the header.

minghao4 avatar Mar 09 '18 19:03 minghao4

Hi all: I'm having the same error. Command: biscuit pileup gamo_ref2.fa DC050biscuitoutput.bam

Error: [refcache_fetch:85] Error, cannot retrieve reference Chr1:0-0.

The resulting output vcf has correct headers but no data. I previously ran Biscuit pileup correctly using a consensus pileup reference created from my reads, I have also run it without problems using a zebra finch genome. However, I recently tried to re-run the analysis using a recently created genome for my species and have been unable to correctly run the pileup command.

Following the advice above in the thread I renamed all the chromosomes to have simpler names: the reference fasta now has the following chromosomes: `>Chr1

Chr1A Chr2 Chr3 Chr4 Chr4A Chr5 Chr6 Chr7 Chr8 Chr9 Chr10 Chr11 Chr12 Chr13 Chr14 Chr15 Chr17 Chr18 Chr19 Chr20 Chr21 Chr22 Chr23 Chr24 Chr25 Chr26 Chr27 Chr28 ChrZ ChrSS1 ChrSS2 ChrSS3 ChrSS4 ChrUS1 ChrUS2 ChrUS3 ChrUS4 ChrUS5`

I indexed the fasta and aligned my reads again and pileup still will not work.

Any suggestions would be greatly appreciated.

smcnew avatar Jul 10 '19 20:07 smcnew

I have run into this issue quite a bit, oddly, it doesn't seem to be deterministic (sometimes it works, sometimes not). Say I have the following files in my working directory:

sample.bam
ref.fa

if I run biscuit pileup ref.fa sample.bam, I might get the following output:

[fai_load] build FASTA index.
...
[fai_load] build FASTA index.
[refcache_fetch:96] Error, cannot retrieve reference chr1:0-0.

However, if i run samtools faidx in the directory (creating ref.fa.fai), then run biscuit pileup as above, it seems to work without error.

Based on this I'm guessing that the built-in faidx generation in biscuit is buggy (perhaps using an out-of-date htslib function?). Suggest generating a fresh faidx using samtools before running biscuit pileup.

njspix avatar Mar 31 '23 14:03 njspix

I don't recall biscuit generates faidx in any of its function. Yeah use samtools.

zwdzwd avatar Mar 31 '23 14:03 zwdzwd

Perhaps we should document that a faidx is necessary to run the command, even though it's not specified in the command...

njspix avatar Mar 31 '23 15:03 njspix

Thank you @njspix ! I followed your suggestion and it worked for me: Using "samtools faidx" instead of "biscuit index"

I hope this modification can be documented.

somnya avatar Jan 29 '24 14:01 somnya