bcftools icon indicating copy to clipboard operation
bcftools copied to clipboard

Add --verbosity flag like samtools

Open pettyalex opened this issue 1 year ago • 1 comments

I've been using bcftools with Google Cloud Storage support, and it would be very useful to be able to pass verbosity into htslib in order to understand how the http requests are working.

It seems that although samtools has a way to set verbosity, bcftools does not.

pettyalex avatar Jul 25 '24 20:07 pettyalex

I'll be more specific: I'm trying to understand failures when writing to Google Cloud Storage, and struggling to produce any log messages at all.

Specifically, I'm invoking bcftools like:

 export GCS_OAUTH_TOKEN=$(gcloud auth print-access-token)
 export GCS_REQUESTER_PAYS_PROJECT=my-project

bcftools view -i 'FMT/GQ>=20 & FMT/AD > 10 & F_MISSING <= 0.1' \
  -q 0.01 -f PASS -Ob1 -o gs://my-bucket/output.bcf --threads $(nproc) gs://source-bucket/input.vcf.gz && \
  bcftools index -f gs://my-bucket/output.bcf

I'm seeing intermittent failures and need visibility into what's happening. I see this run successfully for many hours at a time, sometimes it's able to write out the whole file, sometimes it fails with only [main_vcfview] Error: cannot write to gs://my-bucket/output.bcf

I am confident that my tokens are valid and able to read/write to those buckets, because I am consistently able to read/write small files with those tokens.

pettyalex avatar Jul 30 '24 16:07 pettyalex

@pettyalex - strace may help here, can give you visiblity into system calls

Network related ones something like:

strace -ff -e trace=file,open,read,write,connect,sendto,recvfrom,sendmsg,recvmsg,poll -s 200 -o strace.log bcftools ...

And tcpdump to check out network traffic

Personally, I've been bitten so many times by long running jobs failing due to network hiccups, I generally run them locally then do the copy as separate operations

davmlaw avatar Apr 11 '25 01:04 davmlaw

The code that changes verbosity in samtools is here. It just sets the hts_verbose variable that's exported by HTSlib.

daviesrob avatar Apr 24 '25 10:04 daviesrob

Hi, this may be a bit late, but I just added an experimental (and for now hidden) option bcftool view --hts-verbose. Can you try it out please? If it works, we can list it on the usage page and add it to other commands as well.

pd3 avatar May 07 '25 14:05 pd3

Thank you, I'll check this out and validate!

pettyalex avatar May 07 '25 19:05 pettyalex

Hmm. Am I misunderstanding how it would be used? I built locally including latest develop with this change, and tried

./bcftools view gs://genomics-public-data/1000-genomes-phase-3/vcf/ALL.chr1.phase3_shapeit2_mvncall_integrated_v2.20130502.genotypes.vcf --hts-verbose 9 2> bcf_err.txt | less -S

My assumption is that debug info would be written to bcf_err.txt, but as I scrolled through the VCF i saw none generated. That google cloud URL is public and free if you'd like to test this exact command yourself.

pettyalex avatar May 07 '25 19:05 pettyalex

Uh, there was a silly bug. Can you try now please? https://github.com/samtools/bcftools/commit/65cbaea1b7205bed87bf863ee7982a4df683f718

pd3 avatar May 12 '25 13:05 pd3

Yes! That does exactly what I was hoping for. It will be easier to get visibility into failures now, and determine when it's possible to use GCS support directly.

We use bcftools (and samtools and dozens of other tools that depend on htslib) in cloud environments where it's really advantageous to be able to read directly from object storage without copying files locally. I think that for huge files this won't be reliable unless the GCS client is improved, but it works well for smaller files right now.

pettyalex avatar May 12 '25 16:05 pettyalex

OK. I added the -v, --verbosity option to all commands and plugins. Hopefully this will be useful.

pd3 avatar May 20 '25 16:05 pd3