bcftools icon indicating copy to clipboard operation
bcftools copied to clipboard

Runtime tests are failing on several hardware architectures

Open tillea opened this issue 6 years ago • 12 comments

Hi, the Debian package of bcftools received a bug report pointing to lots of build time test failures for several architectures. For instance you can check a log of a build on i386 (just seek for the string ".. failed ..."). There is an overview page about all architectures which leads to the conclusion that there are different types of failures for different architectures (wild guessing from different numbers of failures per some architectures while other architectures show the same numver failures). While most probably bcftools is mostly run on amd64 architecture and also arm it is comiling fine this kind of errors typically are hints for issues in the code that deserves fixing in general. Please note that while the bug report is against version 1.3.1 the logs are belonging also to version 1.5.4. Kind regards, Andreas.

tillea avatar Dec 11 '17 12:12 tillea

When I was doing the windows port a huge number of bcftools failures were down to variation in the last digit of a floating point number - ie rounding differences. I haven't looked at this particular issue, but we really need a more robust comparison mechanism to determine equality.

The other thing we need to be careful of is assuming zlib. There are many equally valid deflate streams, but IIRC using say the intel deflate causes the test harness to fail too.

jkbonfield avatar Dec 11 '17 12:12 jkbonfield

Hi James,

On Mon, Dec 11, 2017 at 12:49:34PM +0000, James Bonfield wrote:

When I was doing the windows port a huge number of bcftools failures were down to variation in the last digit of a floating point number - ie rounding differences. I haven't looked at this particular issue, but we really need a more robust comparison mechanism to determine equality.

The observed test failures would fit to this observation at least. I fully agre that more robust floating point comparison should be the first means to takle this.

tillea avatar Dec 11 '17 13:12 tillea

A simple and quick hack for now is to modify test.pl to round float values first before comparing.

pd3 avatar Dec 11 '17 15:12 pd3

On Mon, Dec 11, 2017 at 03:04:18PM +0000, pd3 wrote:

A simple and quick hack for now is to modify test.pl to round float values first before comparing. I admit that sounds promising, but I'd leave this solution rather to upstream than fiddling around in code I never touched before.

tillea avatar Dec 11 '17 19:12 tillea

Thinking about it more, this is actually the right place to do it. The program works correctly, these rounding problems are just an artifact of evaluation. The rounding would be done on the strings entering the comparison here https://github.com/pd3/bcftools/blob/develop/test/test.pl#L426

pd3 avatar Dec 12 '17 09:12 pd3

Agreed @tillea this is something for us to do. There may be code already for handling this for Windows; eg lines 450ish in https://github.com/samtools/bcftools/pull/609/files?diff=split#diff-5e25dd32ec9d21cde3301825bf56195c

That was mainly for how exponents are printed, but it would be easy enough to do a regexp matching exponentials and reformat using a limited precision printf command.

I'm not sure ALL of the errors will be this nature though. I suspect there are other things which are failures-in-waiting, such as relying on zlib output to always match (it may not if we used another zlib library). We need to build a virtual machine somewhere I guess to test these whacky platforms.

jkbonfield avatar Dec 12 '17 09:12 jkbonfield

Just for the sake of completeness: I've built bcftools version 1.6 for Debian experimental and the issue has not changed in the latest version.

tillea avatar Dec 13 '17 07:12 tillea

Can you try with this latest commit please? I expect some of the failures might be fixed by this. For the rest, it would be helpful to see all the /<<PKGBUILDDIR>>/test/*.new files created by the test script. Thank you

pd3 avatar Dec 13 '17 13:12 pd3

On Wed, Dec 13, 2017 at 05:08:45AM -0800, pd3 wrote:

Can you try with this latest commit please? I expect some of the failures might be fixed by this. For the rest, it would be helpful to see all the /<<PKGBUILDDIR>>/test/*.new files created by test script. Thanks for the patch. Unfortunately it has not changed the results at all (guessed from the number of the failed tests of the architectures).

tillea avatar Dec 14 '17 07:12 tillea

2020 update: tests still fail on s390x, a big-endian architecture

https://github.com/samtools/htslib/pull/1023#issuecomment-587518184

https://buildd.debian.org/status/fetch.php?pkg=bcftools&arch=s390x&ver=1.10.2-2&stamp=1581632647&raw=0

Number of tests: total .. 961 passed .. 936 failed .. 25

One example, as noted in https://github.com/samtools/htslib/issues/355#issuecomment-553366446

test_naive_concat:
	/<<PKGBUILDDIR>>/bcftools concat --naive-force /tmp/kUdZ7xZSaN/naive_concat.0.bcf /tmp/kUdZ7xZSaN/naive_concat.1.bcf /tmp/kUdZ7xZSaN/naive_concat.2.bcf /tmp/kUdZ7xZSaN/naive_concat.3.bcf /tmp/kUdZ7xZSaN/naive_concat.4.bcf /tmp/kUdZ7xZSaN/naive_concat.5.bcf /tmp/kUdZ7xZSaN/naive_concat.6.bcf /tmp/kUdZ7xZSaN/naive_concat.7.bcf /tmp/kUdZ7xZSaN/naive_concat.8.bcf /tmp/kUdZ7xZSaN/naive_concat.9.bcf | /<<PKGBUILDDIR>>/bcftools view -H

	Non-zero status 255

		Concatenating /tmp/kUdZ7xZSaN/naive_concat.0.bcfFailed to read from standard input: unknown file type
		

.. failed ...

mr-c avatar Feb 20 '20 07:02 mr-c

End of 2020 update: tests still fail on big-endian architectures like s390x, ppc64 and others which you can chech here.

tillea avatar Dec 17 '20 08:12 tillea