bcftools icon indicating copy to clipboard operation
bcftools copied to clipboard

bcftools isec does not support stdin input

Open tommycarstensen opened this issue 10 years ago • 4 comments

This is an enhancement request. It would be great, if bcftools isec could support stdin input like this:

bcftools isec -C <(bcftools view [OPTIONS] $vcf1) <(bcftools view [OPTIONS] $vcf2)

tommycarstensen avatar Jul 27 '15 16:07 tommycarstensen

That syntax uses bash's process substitution http://tldp.org/LDP/abs/html/process-sub.html feature, so the problem isn't lack of support for standard streams. It is, I believe, that isec requires indexed input.

-Kevin

On Mon, Jul 27, 2015 at 10:09 AM, Tommy Carstensen <[email protected]

wrote:

This is an enhancement request. It would be great, if bcftools isec could support stdin input like this:

bcftools isec -C <(bcftools view [OPTIONS] $vcf1) <(bcftools view [OPTIONS] $vcf2)

— Reply to this email directly or view it on GitHub https://github.com/samtools/bcftools/issues/297.

bioinformed avatar Jul 27 '15 16:07 bioinformed

It seems that isec still doesn't support real standard input though. For example:

[anovak@kolossus vg]$ bcftools view gbwt-experiment/21/gbwt-graphs-v3/slls/1kg_hg19-CHR21.vcf.gz --private --samples HG00096 --force-samples --output-type z | bcftools isec --complement gbwt-experiment/21/gbwt-graphs-v3/slls/1kg_hg19-CHR21.vcf.gz - --write 1 --output-type z | bcftools view - --samples ^HG00096 --force-samples --output-type z >~/hive/trash/allbut.vcf.gz
Failed to open -: could not load index
Failed to open -: unknown file type
[anovak@kolossus vg]$ bcftools view gbwt-experiment/21/gbwt-graphs-v3/slls/1kg_hg19-CHR21.vcf.gz --private --samples HG00096 --force-samples | bcftools isec --complement gbwt-experiment/21/gbwt-graphs-v3/slls/1kg_hg19-CHR21.vcf.gz - --write 1 --output-type z | bcftools view - --samples ^HG00096 --force-samples --output-type z >~/hive/trash/allbut.vcf.gz
Failed to open -: not compressed with bgzip
Failed to open -: unknown file type

I'm trying to do isec in a pipeline, and it complains that the input is not indexed (when I send it compressed VCF) or not compressed (when I send it uncompressed VCF).

Since it's impossible to index a stream, isec should either bail out immediately as soon as it sees it is being told to read a stream, or somehow support non-indexed input when reading from a stream.

adamnovak avatar Mar 29 '18 18:03 adamnovak

Mmm, the attempted fix does not work universally, isatty does not work as expected on some systems.

pd3 avatar Mar 10 '23 08:03 pd3

The only system I know of where it's broken is Mingw/msys, where it sometimes works and sometimes not depending on the environment. There is a bunch of exceptions in the samtools usage tests for tools which with no args will just wedge instead of reporting usage.

https://github.com/samtools/samtools/blob/develop/test/test.pl#L975-L977

That's about the best solution we could come up with as it's a minor problem and simply ignoring the tests was easy enough.

jkbonfield avatar Mar 10 '23 09:03 jkbonfield