Add --verbosity flag like samtools
I've been using bcftools with Google Cloud Storage support, and it would be very useful to be able to pass verbosity into htslib in order to understand how the http requests are working.
It seems that although samtools has a way to set verbosity, bcftools does not.
I'll be more specific: I'm trying to understand failures when writing to Google Cloud Storage, and struggling to produce any log messages at all.
Specifically, I'm invoking bcftools like:
export GCS_OAUTH_TOKEN=$(gcloud auth print-access-token)
export GCS_REQUESTER_PAYS_PROJECT=my-project
bcftools view -i 'FMT/GQ>=20 & FMT/AD > 10 & F_MISSING <= 0.1' \
-q 0.01 -f PASS -Ob1 -o gs://my-bucket/output.bcf --threads $(nproc) gs://source-bucket/input.vcf.gz && \
bcftools index -f gs://my-bucket/output.bcf
I'm seeing intermittent failures and need visibility into what's happening. I see this run successfully for many hours at a time, sometimes it's able to write out the whole file, sometimes it fails with only [main_vcfview] Error: cannot write to gs://my-bucket/output.bcf
I am confident that my tokens are valid and able to read/write to those buckets, because I am consistently able to read/write small files with those tokens.
@pettyalex - strace may help here, can give you visiblity into system calls
Network related ones something like:
strace -ff -e trace=file,open,read,write,connect,sendto,recvfrom,sendmsg,recvmsg,poll -s 200 -o strace.log bcftools ...
And tcpdump to check out network traffic
Personally, I've been bitten so many times by long running jobs failing due to network hiccups, I generally run them locally then do the copy as separate operations
The code that changes verbosity in samtools is here. It just sets the hts_verbose variable that's exported by HTSlib.
Hi, this may be a bit late, but I just added an experimental (and for now hidden) option bcftool view --hts-verbose. Can you try it out please? If it works, we can list it on the usage page and add it to other commands as well.
Thank you, I'll check this out and validate!
Hmm. Am I misunderstanding how it would be used? I built locally including latest develop with this change, and tried
./bcftools view gs://genomics-public-data/1000-genomes-phase-3/vcf/ALL.chr1.phase3_shapeit2_mvncall_integrated_v2.20130502.genotypes.vcf --hts-verbose 9 2> bcf_err.txt | less -S
My assumption is that debug info would be written to bcf_err.txt, but as I scrolled through the VCF i saw none generated. That google cloud URL is public and free if you'd like to test this exact command yourself.
Uh, there was a silly bug. Can you try now please? https://github.com/samtools/bcftools/commit/65cbaea1b7205bed87bf863ee7982a4df683f718
Yes! That does exactly what I was hoping for. It will be easier to get visibility into failures now, and determine when it's possible to use GCS support directly.
We use bcftools (and samtools and dozens of other tools that depend on htslib) in cloud environments where it's really advantageous to be able to read directly from object storage without copying files locally. I think that for huge files this won't be reliable unless the GCS client is improved, but it works well for smaller files right now.
OK. I added the -v, --verbosity option to all commands and plugins. Hopefully this will be useful.