bcftools icon indicating copy to clipboard operation
bcftools copied to clipboard

Does STDEV Work? 1.10.2

Open ttbek opened this issue 4 years ago • 4 comments

Does STDEV for sure work in 1.10.2? -e 'FORMAT/DP[*:0] > MEAN(FORMAT/DP)' seems to work fine, but -e 'FORMAT/DP[*:0] > STDEV(FORMAT/DP)' filters out everything. And something like -e 'FORMAT/DP[*:0] > (30+STDEV(FORMAT/DP))' seems to use exactly 30 as the boundary. Likewise -e 'FORMAT/DP[*:0] > (MEAN(FORMAT/DP)+STDEV(FORMAT/DP))' seems to filter exactly on the mean as if the stdev is 0. For that site it should have been almost 8. Does someone have a working example?

I observe the same behavior in a beta version of 1.9, in 1.10.0, and 1.10.1.

A potential enhancement that would make it a bit easier to see what is going on is if filter parameters could be output in the Format of query, that way we could easily check what value STDEV takes without trying to find the boundary by looking at which things are filtered and which aren't. Or probably not that exactly, as it would print the same value an awful lot, but some sort of more convenient value check would be nice. Maybe output to stderr?

As a comment regarding something mentioned in #985 removed functions should probably remain documented, even if all it says is something like AVG: Deprecated in version... If things are outright removed then it makes it difficult to understand old examples floating around and potentially what the modern equivalent is (AVG and MEAN is pretty obvious, but other deprecations may not have such obvious replacements).

I'm probably just doing something wrong. Unfortunately I cannot share my data. If people do not know of a working example I can see, and they cannot reproduce this behavior easily, then I will try to construct a test case without real data.

ttbek avatar Jun 24 '20 04:06 ttbek

There were some fixes recently, can you try with the latest github version and if the problem persists, provide a small test case to reproduce the bug? http://samtools.github.io/bcftools/howtos/install.html

pd3 avatar Jun 24 '20 05:06 pd3

Thanks for the swift response. Our data can't leave the cluster and that cluster is even kept off the internet largely speaking (users connect through a VPN and then OVD but things like wget, git clone, etc... do not have internet access, I'm probably not supposed to look for ways around that), I will need to ask the admins to install the new version. In the meantime, was there a previous working example?

ttbek avatar Jun 24 '20 05:06 ttbek

I found another file with a similar structure to test on locally. Version 1.10.2 with htslib 1.10.2 is definitely wrong. 1.10.2-86-ge313e0f with htslib 1.10.2-86-g313a9fc appears to function correctly at least for the local file based on my eyeballing so far (not rigorous, but at least it makes sense at a glance). I'll ask them to install the latest and try that on the other file. Should the manual be updated to reflect this (since the manual is for 1.10.2 right now)?

ttbek avatar Jun 24 '20 07:06 ttbek

1.10.2-86-ge313e0f with htslib 1.10.2-86-g313a9fc functions correctly on the original file as well. Feel free to close this as soon as the documentation reflects that this does not actually work in the versioned release 1.10.2.

It is implied to be available (and hence implied to work) in the versioned documentation here: http://www.htslib.org/doc/bcftools.html#expressions

Alternatively, I could close this and file a separate issue here: https://github.com/samtools/www.htslib.org for a documentation update? I would make the pull request but I'm not sure if I should change more than just STDEV as I haven't tried anything but MEAN and STDEV, so I don't know if MIN, MAX, MEDIAN, etc... are working in 1.10.2 or are also only working after the recent fixes or were already fine.

If possible, it may also be worth editing the 1.10.0 release text. https://github.com/samtools/bcftools/releases/tag/1.10 While it may also throw people off, it isn't as critical as the official documentation.

ttbek avatar Jun 28 '20 08:06 ttbek