fflas-ffpack icon indicating copy to clipboard operation
fflas-ffpack copied to clipboard

test-charpoly-check failing on sparc64

Open d-torrance opened this issue 8 years ago • 23 comments

Hello!

I just packaged fflas-ffpack 2.2.2 for Debian, and it is failing to build on several architectures due to failing tests. (The following are pasted from the mips build log [1].)

FAIL: test-lu
=============
[...]
Checking ..............Modular<Integer> modulo 29 ... rank is wrong (expecting 11 but got 0)
rank is wrong (expected 11 but got 0)
failed at big lda
rank is wrong (expecting 20 but got 0)
rank is wrong (expected 20 but got 0)
failed at big lda max rank
failed at big lda, rank 0
rank is wrong (expecting 10 but got 0)
rank is wrong (expected 10 but got 0)
failed at square
rank is wrong (expecting 15 but got 0)
rank is wrong (expected 15 but got 0)
failed at wide
rank is wrong (expecting 7 but got 0)
rank is wrong (expected 7 but got 0)
failed at narrow
rank is wrong (expecting 11 but got 0)
rank is wrong (expected 11 but got 0)
failed at big lda
rank is wrong (expecting 20 but got 0)
rank is wrong (expected 20 but got 0)
failed at big lda max rank
failed at big lda, rank 0
rank is wrong (expecting 10 but got 0)
rank is wrong (expected 10 but got 0)
failed at square
rank is wrong (expecting 15 but got 0)
rank is wrong (expected 15 but got 0)
failed at wide
rank is wrong (expecting 7 but got 0)
rank is wrong (expected 7 but got 0)
failed at narrow
rank is wrong (expecting 11 but got 0)
rank is wrong (expected 11 but got 0)
failed at big lda
rank is wrong (expecting 20 but got 0)
rank is wrong (expected 20 but got 0)
failed at big lda max rank
failed at big lda, rank 0
rank is wrong (expecting 10 but got 0)
rank is wrong (expected 10 but got 0)
failed at square
rank is wrong (expecting 15 but got 0)
rank is wrong (expected 15 but got 0)
failed at wide
rank is wrong (expecting 7 but got 0)
rank is wrong (expected 7 but got 0)
failed at narrow
rank is wrong (expecting 11 but got 0)
rank is wrong (expected 11 but got 0)
failed at big lda
rank is wrong (expecting 20 but got 0)
rank is wrong (expected 20 but got 0)
failed at big lda max rank
failed at big lda, rank 0
rank is wrong (expecting 10 but got 0)
rank is wrong (expected 10 but got 0)
failed at square
rank is wrong (expecting 15 but got 0)
rank is wrong (expected 15 but got 0)
failed at wide
rank is wrong (expecting 7 but got 0)
rank is wrong (expected 7 but got 0)
failed at narrow
FAILED

FAIL: test-echelon
==================
Checking ...............Modular<double> mod 76667 .........PASSED 
Checking ..................Modular<double> mod 23 .........PASSED 
Checking ..................Modular<double> mod 89 .........PASSED 
Checking ......ModularBalanced<double> mod 561181 .........PASSED 
Checking ..........ModularBalanced<double> mod 31 .........PASSED 
Checking ........ModularBalanced<double> mod 2503 .........PASSED 
Checking ..................Modular<float> mod 223 .........PASSED 
Checking ..................Modular<float> mod 151 .........PASSED 
Checking ..................Modular<float> mod 359 .........PASSED 
Checking .........ModularBalanced<float> mod 1283 .........PASSED 
Checking ..........ModularBalanced<float> mod 421 .........PASSED 
Checking .........ModularBalanced<float> mod 1259 .........PASSED 
Checking .Modular<int32_t, uint32_t> modulo 11003 .........PASSED 
Checking ....Modular<int32_t, uint32_t> modulo 29 .........PASSED 
Checking .....Modular<int32_t, uint32_t> modulo 3 .........PASSED 
Checking ......ModularBalanced<int32_t> mod 13499 .........PASSED 
Checking .........ModularBalanced<int32_t> mod 73 .........PASSED 
Checking .......ModularBalanced<int32_t> mod 6871 .........PASSED 
Checking .....Modular<int64_t, int64_t> modulo 43 .........PASSED 
Checking Modular<int64_t, int64_t> modulo 52689971 .........PASSED 
Checking Modular<int64_t, int64_t> modulo 820673699 .........PASSED 
Checking .......ModularBalanced<int64_t> mod 7433 .........PASSED 
Checking .....ModularBalanced<int64_t> mod 359663 .........PASSED 
Checking .....ModularBalanced<int64_t> mod 107137 .........PASSED 
FAIL test-echelon (exit status: 139)

FAIL: test-rankprofiles
=======================
[...]
Checking Modular<Integer> modulo 272998032472030762247254850999851950143 ... FAILED 
FAIL test-rankprofiles (exit status: 1)

FAIL: test-fgemm
================
[...]
Checking Modular<Integer> modulo 8549871607103756297543434634416548303828878605453302128157720522884613235851910725097929051316586481512924427759295421112023965613109239381708161612926577 ... FAIL
a   :1, b   : 0
m   :15, n   : 26, k   : 26
ldA :32, ldB : 32, ldC : 30
Error C[0,0]=0 D[0,0]=467492732658410772275091190117627655542253651983001388578517645829495781150794733684597850114989973765636043935395580403988600206394352625839543034249789
Error C[0,1]=0 D[0,1]=3745850637687372787032133162786181752923976304361774184687299100090593606189640549339756280744799085424074197777123010114450581511671044936589874969831713
Error C[0,2]=0 D[0,2]=8286404725316917254688714949260944254011537941467078201617202765729213187054127467488830363130801792602756153724938361376488366188203128350292301369873926
Error C[0,3]=0 D[0,3]=7719842182036168601239979864756689654896968016098854378423193365280462943223639927166648047194689950133225952138330062822670878063601456021183476952591029
Error C[0,4]=0 D[0,4]=2024153580475532119587258559457193287346898570762148019971431860242110513024559260011015946501434441705820455529574060348920621050772638304443142900509912
Error C[0,5]=0 D[0,5]=20141380879160588302903723680879607892681391439496049917031962141045240109793641500262038502579839897910979174361960938528280325891868479883106216500846
Error C[0,6]=0 D[0,6]=4391623757211580394981142711365624473283723132004364468578042038987954220345931201010465272737045531053542896869796518680065036563453611907149726297278828
Error C[0,7]=0 D[0,7]=7145044132612825887529996517477527837989145729527347602699814003707490049064275886634152750968051932467580700126399656627669145655997899348019974398331288
Error C[0,8]=0 D[0,8]=1162056415306067048056344432602188553248849536319725726925267530516646030835101050918971464286388162765976112658166266823492179915873703334802593027996927
Error C[0,9]=0 D[0,9]=5391984716822375185454705906327368550846001455505002431488537190586214905645659123376661256794728494606528351826240719416122998491862837842906524825067266
Error C[0,10]=0 D[0,10]=2193532197288498574734861369684182498055535867358417603399364153343391698691030762288301969500710193967693347886190582912066842231271246096182892095605959
Error C[0,11]=0 D[0,11]=2624532911059920179180095706707244358148606542398416052415347939403828134468523920604669737503086596630490776845197767571758276278278191461009633193828347
Error C[0,12]=0 D[0,12]=3824058857929448047878016258319328821213902355230152570114023373874669070193964224840157989142795725361058539846319450404858289844120222578875049028092694
Error C[0,13]=0 D[0,13]=8488926214544649928946158289983257571986404239711143898331161061672932983731253748159571045407716898119928713472262377738179411236640573023199132821771987
Error C[0,14]=0 D[0,14]=8446230141577834220808492590599058489726893360083666083959660498179870853695198490739565290083853052922584335433857016447280120146953798248293970266650541
Error C[0,15]=0 D[0,15]=5458897513768857520249369908248619852903197296497582007240795265534584137713653353959244315841525059439028312283991843795889213682520912258742727419218179
Error C[0,16]=0 D[0,16]=6715744657584867924437252754908658736316361517364899772634395902309515686261743192031073767359337282039814832752313092806684013255350004845231712908981814
Error C[0,17]=0 D[0,17]=1180550868429558577387803899302922946040118199804321268305374976108423786550517468434261788762720736487189789785499144845987141527162494605060510016163013
Error C[0,18]=0 D[0,18]=5068758751300120035784900616768905386216966948934546078944345181003163460523368240846960084376983129885419806701841041251611064084608879583409778801011039
Error C[0,19]=0 D[0,19]=8516674032910371447473511292247098025582897666097326091212928806484548818841348645545236340701615335664105781567594799896950348911481513472413885443418184
XXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXX
FAIL

The same tests fail on powerpc [2], s390x [3], and hppa [4].

On sparc64 [5], all of the above tests fail in addition to:

FAIL: test-pluq-check
=====================
terminate called after throwing an instance of 'FailureTrsmCheck'
FAIL test-pluq-check (exit status: 134)

On armel [6], only test-rankprofiles and test-fgemm fail.

Thank you!

[1] https://buildd.debian.org/status/fetch.php?pkg=fflas-ffpack&arch=mips&ver=2.2.2-1&stamp=1472626266 [2] https://buildd.debian.org/status/fetch.php?pkg=fflas-ffpack&arch=powerpc&ver=2.2.2-1&stamp=1472625237 [3] https://buildd.debian.org/status/fetch.php?pkg=fflas-ffpack&arch=s390x&ver=2.2.2-1&stamp=1472625205 [4] https://buildd.debian.org/status/fetch.php?pkg=fflas-ffpack&arch=hppa&ver=2.2.2-1&stamp=1472627635 [5] https://buildd.debian.org/status/fetch.php?pkg=fflas-ffpack&arch=sparc64&ver=2.2.2-1&stamp=1472625542 [6] https://buildd.debian.org/status/fetch.php?pkg=fflas-ffpack&arch=armel&ver=2.2.2-1&stamp=1472627148

d-torrance avatar Aug 31 '16 13:08 d-torrance

From our Fedora builds - the test-lu enters an endless loop even

sharkcz avatar Oct 04 '16 11:10 sharkcz

These test failures have now been reported as a "release critical" bug, i.e., fflas-ffpack won't be included in the next Debian release unless they're fixed.

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=840454 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=840455 (sparc64 test-pluq-check only)

d-torrance avatar Oct 11 '16 21:10 d-torrance

In case of PowerPC, test-lu is stuck at

Program received signal SIGINT, Interrupt. 0x00003fffb6f8fb40 in .__floorf_power5plus () from /lib64/power8/libm.so.6 (gdb) bt #0 0x00003fffb6f8fb40 in .__floorf_power5plus () from /lib64/power8/libm.so.6 #1 0x000000002002826c in std::floor (__x=) at /usr/include/c++/6.2.1/cmath:284 #2 Givaro::invext (b=, a=) at /usr/include/givaro/modular-general.inl:74 #3 Givaro::ModularBalanced::inv (this=0x20156c50, r=@0x3fffffffdf64: 0, a=) at /usr/include/givaro/modular-balanced-float.inl:70 #4 0x00000000200d5264 in FFPACK::LUdivineGivaro::ModularBalanced ( F=..., Diag=, trans=, M=, N=, A=0x201672b0, lda=130, P=0x20175c60, Q=0x20176048, LuTag=FFPACK::FfpackSlabRecursive, cutoff=0) at ../fflas-ffpack/ffpack/ffpack_ludivine.inl:494 #5 0x00000000200d495c in FFPACK::LUdivineGivaro::ModularBalanced ( F=..., Diag=, trans=, M=, N=118, A=0x201672b0, lda=130, P=0x20175c60, Q=0x20176048, LuTag=FFPACK::FfpackSlabRecursive, cutoff=0) at ../fflas-ffpack/ffpack/ffpack_ludivine.inl:552 #6 0x00000000200d49c4 in FFPACK::LUdivineGivaro::ModularBalanced ( F=..., Diag=, trans=, M=, N=119, A=0x20166e9c, lda=130, P=0x20175c58, Q=0x20176038, LuTag=FFPACK::FfpackSlabRecursive, cutoff=0) at ../fflas-ffpack/ffpack/ffpack_ludivine.inl:576 #7 0x00000000200d49c4 in FFPACK::LUdivineGivaro::ModularBalanced ( F=..., Diag=, trans=, M=, N=120, A=0x20166880, lda=130, P=0x20175c50, Q=0x20176020, LuTag=FFPACK::FfpackSlabRecursive, cutoff=0) at ../fflas-ffpack/ffpack/ffpack_ludivine.inl:576 #8 0x00000000200d495c in FFPACK::LUdivineGivaro::ModularBalanced ( F=..., Diag=, trans=, M=, N=120, A=0x20166880, lda=130, P=0x20175c50, Q=0x20176020, LuTag=FFPACK::FfpackSlabRecursive, cutoff=0) at ../fflas-ffpack/ffpack/ffpack_ludivine.inl:552 #9 0x00000000200d495c in FFPACK::LUdivineGivaro::ModularBalanced ( F=..., Diag=, trans=, M=, N=120, A=0x20166880, lda=130, P=0x20175c50, Q=0x20176020, LuTag=FFPACK::FfpackSlabRecursive, cutoff=0) at ../fflas-ffpack/ffpack/ffpack_ludivine.inl:552 #10 0x00000000200d495c in FFPACK::LUdivineGivaro::ModularBalanced ( F=..., Diag=, trans=, M=, N=120, A=0x20166880, lda=130, P=0x20175c50, Q=0x20176020, LuTag=FFPACK::FfpackSlabRecursive, cutoff=0) at ../fflas-ffpack/ffpack/ffpack_ludivine.inl:552 #11 0x00000000200d495c in FFPACK::LUdivineGivaro::ModularBalanced ( F=..., Diag=, trans=, M=, N=120, A=0x20166880, lda=130, P=0x20175c50, Q=0x20176020, LuTag=FFPACK::FfpackSlabRecursive, cutoff=0) at ../fflas-ffpack/ffpack/ffpack_ludivine.inl:552 #12 0x00000000200d5750 in test_LUdivineGivaro::ModularBalanced<float, (FFLAS::FFLAS_DIAG)132, (FFLAS::FFLAS_TRANSPOSE)111> (F=..., A=0x201574b0, lda=130, r=70, m=120, n=120) at test-lu.C:106 #13 0x00000000200d5fc8 in launch_testGivaro::ModularBalanced<float, (FFLAS::FFLAS_DIAG)132, (FFLAS::FFLAS_TRANSPOSE)111> (F=..., r=70, m=120, n=120) at test-lu.C:838 #14 0x00000000200da284 in run_with_fieldGivaro::ModularBalanced ( q=..., b=0, m=120, n=120, r=70, iters=) at test-lu.C:1039 #15 0x000000002001c598 in main (argc=, argv=) at test-lu.C:1092

r4f4 avatar Oct 19 '16 12:10 r4f4

I ran the test again, but this time I compiled the package with debugging enabled. This is the output:

(gdb) bt #0 0x00003fffb6c5f818 in .raise () from /lib64/power8/libc.so.6 #1 0x00003fffb6c61f64 in .abort () from /lib64/power8/libc.so.6 #2 0x00003fffb70f0b84 in .__gnu_cxx::__verbose_terminate_handler() () from /lib64/libstdc++.so.6 #3 0x00003fffb70ed484 in ?? () from /lib64/libstdc++.so.6 #4 0x00003fffb70ed528 in .std::terminate() () from /lib64/libstdc++.so.6 #5 0x00003fffb70ed94c in .__cxa_throw () from /lib64/libstdc++.so.6 #6 0x0000000010062d10 in FFLAS::CheckerImplem_fgemm<Givaro::Modular<float, float> >::check (C=0x101b6a84, ldb=130, B=0x101b687c, lda=130, A=0x101b6a80, alpha=112, tb=FFLAS::FflasNoTrans, ta=, this=) at ../fflas-ffpack/checkers/checker_fgemm.inl:86 #7 FFLAS::fgemm<Givaro::Modular<float, float> > (F=..., ta=, tb=, m=1, n=, k=, alpha=, A=0x101b6a80, lda=130, B=0x101b687c, ldb=130, beta=, C=0x101b6a84, ldc=130) at ../fflas-ffpack/fflas/fflas_fgemm.inl:344 #8 0x00000000100f61a0 in FFPACK::LUdivine<Givaro::Modular<float, float> > ( F=..., Diag=, trans=, M=, N=, A=0x101b6878, lda=130, P=0x101c5a40, Q=0x101c5e18, LuTag=FFPACK::FfpackSlabRecursive, cutoff=0) at ../fflas-ffpack/ffpack/ffpack_ludivine.inl:571 #9 0x00000000100f5c88 in FFPACK::LUdivine<Givaro::Modular<float, float> > ( F=..., Diag=, trans=, M=, N=, A=0x101b6670, lda=130, P=0x101c5a40, Q=0x101c5e10, LuTag=FFPACK::FfpackSlabRecursive, cutoff=0) at ../fflas-ffpack/ffpack/ffpack_ludivine.inl:576 #10 0x00000000100f5c18 in FFPACK::LUdivine<Givaro::Modular<float, float> > ( F=..., Diag=, trans=, M=, N=, A=, lda=, P=, Q=0x101c5e10, LuTag=FFPACK::FfpackSlabRecursive, cutoff=0) at ../fflas-ffpack/ffpack/ffpack_ludivine.inl:552 #11 0x00000000100f5c18 in FFPACK::LUdivine<Givaro::Modular<float, float> > ( F=..., Diag=, trans=, M=, N=, A=, lda=, P=, Q=0x101c5e10, LuTag=FFPACK::FfpackSlabRecursive, cutoff=0) at ../fflas-ffpack/ffpack/ffpack_ludivine.inl:552 #12 0x00000000100f5c18 in FFPACK::LUdivine<Givaro::Modular<float, float> > ( F=..., Diag=, trans=, M=, N=, A=, lda=, P=, Q=0x101c5e10, LuTag=FFPACK::FfpackSlabRecursive, cutoff=0) at ../fflas-ffpack/ffpack/ffpack_ludivine.inl:552 #13 0x00000000100f5c18 in FFPACK::LUdivine<Givaro::Modular<float, float> > ( F=..., Diag=, trans=, M=, N=, A=, lda=, P=, Q=0x101c5e10, LuTag=FFPACK::FfpackSlabRecursive, cutoff=0) at ../fflas-ffpack/ffpack/ffpack_ludivine.inl:552 #14 0x00000000100f5c18 in FFPACK::LUdivine<Givaro::Modular<float, float> > ( F=..., Diag=, trans=, M=, N=, A=, lda=, P=, Q=0x101c5e10, LuTag=FFPACK::FfpackSlabRecursive, cutoff=0) at ../fflas-ffpack/ffpack/ffpack_ludivine.inl:552 #15 0x00000000100f66c0 in test_LUdivine<Givaro::Modular<float, float>, (FFLAS::FFLAS_DIAG)132, (FFLAS::FFLAS_TRANSPOSE)111> (F=..., A=0x101a72a0, lda=130, r=70, m=120, n=120) at test-lu.C:106 #16 0x00000000100fb1d8 in launch_test<Givaro::Modular<float, float>, (FFLAS::FFLAS_DIAG)132, (FFLAS::FFLAS_TRANSPOSE)111> (F=..., r=70, m=120, n=120) at test-lu.C:838 #17 0x00000000100fc784 in run_with_field<Givaro::Modular<float, float> > ( q=..., b=0, m=120, n=120, r=70, iters=) at test-lu.C:1039 #18 0x0000000010007b14 in main (argc=, argv=) at test-lu.C:1090

r4f4 avatar Oct 24 '16 13:10 r4f4

This is the error when running the tests without optimization on the s390x architecture:

g++ -DHAVE_CONFIG_H -I. -I.. -I.. -g -I../fflas-ffpack/ -I../fflas-ffpack/utils/ -I../fflas-ffpack/fflas/ -I../fflas-ffpack/ffpack -I../fflas-ffpack/field -Wdate-time -D_FORTIFY_SOURCE=2 -O0 -Wall -DNDEBUG -UFFLASFFPACK_DEBUG -std=gnu++11 -D__FFLASFFPACK_HAVE_CBLAS -fopenmp -g -fdebug-prefix-map=/home/thansen/fflas-ffpack-2.2.2=. -fstack-protector-strong -Wformat -Werror=format-security -fabi-version=6 -c -o test-lu.o test-lu.C ../fflas-ffpack/utils/bit_manipulation.h: Assembler messages: ../fflas-ffpack/utils/bit_manipulation.h:114: Error: Unrecognized opcode: `divq'

tobihan avatar Nov 11 '16 22:11 tobihan

The PPC64 build is stuck because NaNs have somehow gotten into the matrix. I haven't tracked that part down yet, but execution is stuck in the loop in invext at /usr/include/givaro/modular-general.inl, lines 66 through 87, because computation with NaNs just yields more Nans, so v3 never converges to zero.

jamesjer avatar Dec 02 '16 18:12 jamesjer

The failing tests all seem to use ModularGivaro::Integer as the Field type. I tried seeding the random number generators with identical values on an x86_64 and a PPC64 machine, so I could use binary search to find where they start to differ. But that's not working because this code reseeds the random number generator with the current time in several places. I have found several, but apparently haven't tracked them all down yet. I'm happy to help the developers debug this issue, but could you please give me a list of every place where the random number generator is reseeded in both givaro and fflas-ffpack? I'd like to encourage you to stop doing this. It makes this kind of debugging impossible. Seed the generators once at the very beginning of the execution of a program and then leave them alone.

jamesjer avatar Dec 29 '16 23:12 jamesjer

At least part of the problem appears to be that fflas-ffpack/field/rns-double.h, fflas-ffpack/field/rns-double.inl, and fflas-ffpack/field/rns-double-recint.inl access the limbs of an mpz_t 16 bits at a time. While the limbs of an mpz_t are in little endian order, the bytes in a limb are in host byte order. However, code in those 3 files appears to assume that the bytes are in little endian order. Look for uint16_t declarations in those files and examine how they are used. I tried throwing together a quick patch for the problem but, alas, I'm still seeing NaNs in the matrix, so either I did not fix the problem correctly or there is yet another problem somewhere.

jamesjer avatar Jan 03 '17 23:01 jamesjer

Actually, my quick patch DOES fix test-fgemm, but not test-lu. So I did something right. :-) Perhaps somebody else can see what I either did wrong or left out. The patch can be viewed here: http://jamezone.org/pleasure/software/fflas-ffpack-endian.patch

jamesjer avatar Jan 03 '17 23:01 jamesjer

I can confirm that with the latest patch fflas-ffpack passes the test-suite on s390x - https://s390.koji.fedoraproject.org/koji/taskinfo?taskID=2438443

sharkcz avatar Jan 04 '17 10:01 sharkcz

Great! And a scratch build for rawhide shows only the ppc64 task hanging: https://koji.fedoraproject.org/koji/taskinfo?taskID=17166024, so all the other test failures go away with this patch. We're still getting NaNs in test-lu with ppc64, though, and I don't know why. :-(

jamesjer avatar Jan 05 '17 04:01 jamesjer

Pull request created: https://github.com/linbox-team/fflas-ffpack/pull/72

jamesjer avatar Jan 05 '17 04:01 jamesjer

The problem with NaNs on ppc64 appears to be due to a bug in ATLAS. The same sources succeed with openblas. The fflas-ffpack code invokes cblas_sgemm with some very ordinary-looking matrices, and down inside the ATLAS code (specifically, ATL_USERMM in ppc64_base/src/blas/gemm/KERNEL/ATL_sNBmm_b0.c), the NaNs are generated. So I think the maintainers should have a look at the pull request, and that should be the end of this issue.

jamesjer avatar Jan 06 '17 00:01 jamesjer

@jamesjer's patches fixed the build on the big endian architectures in Debian! [1]

We tried using -fno-strict-aliasing for armel as Fedora does, but test-lu is still failing. [2] (It worked for me on a local schroot, but failed on Debian's build machines.)

On sparc64, test-pluq-check is still failing. [3]

[1] https://buildd.debian.org/status/logs.php?pkg=fflas-ffpack&ver=2.2.2-3&suite=sid [2] https://buildd.debian.org/status/fetch.php?pkg=fflas-ffpack&arch=armel&ver=2.2.2-4&stamp=1483919946 [3] https://buildd.debian.org/status/fetch.php?pkg=fflas-ffpack&arch=sparc64&ver=2.2.2-4&stamp=1483911466

d-torrance avatar Jan 09 '17 00:01 d-torrance

Catching with this thread. Thanks for catching this bug. I was clueless as I did not have access to big endian archs. As you all seem to consider that PR #72 fixes it, and looking at the code which is fine. I'm happy to merge it. Regarding the random seeding, sorry about this mess. We just recently started to have a systematic option to seed the generators, and still did not clean all old pieces of code where it is seed from the time. Will do.

ClementPernet avatar Jan 10 '17 16:01 ClementPernet

@d-torrance What blas implementation do you use on armel and sparc64? Also, I'm not certain that -fno-strict-aliasing actually does anything useful. GCC doesn't warn about any aliasing issues. It might be a fluke that I added that option just when some other factor made the armel build failures go away.

jamesjer avatar Jan 13 '17 05:01 jamesjer

@jamesjer Right now all architectures are using the default Netlib BLAS which comes with LAPACK.

d-torrance avatar Jan 13 '17 15:01 d-torrance

Updating with the still outstanding issues:

armel (build log) Failing tests:

  • test-lu
  • test-rankprofiles
  • test-fgemm

sparc64 (build log) Failing tests:

  • test-pluq-check

d-torrance avatar Aug 12 '17 20:08 d-torrance

I've just packaged version 2.3.2 for Debian, and test-plug-check is still failing on sparc64, along with test-invert-check and test-charpoly-check. Not sure about armel yet.

Build log

FAIL: test-pluq-check
=====================

terminate called after throwing an instance of 'FailureTrsmCheck'
FAIL test-pluq-check (exit status: 134)

FAIL: test-invert-check
=======================

 -q 131071 -n 0 -i 0 -s 1515633011249149
terminate called after throwing an instance of 'FailureFgemmCheck'
m= 480
FAIL test-invert-check (exit status: 134)

FAIL: test-charpoly-check
=========================

CHARPol server PLUQ : 0.00215793s (0.002157 cpu) [1]
CHARPol client CHECK: 0.00063777s (0.000706 cpu) [4]
CHARPol checked full: 0.0128729s (0.785647 cpu) [1]
72x72 charpoly verification successful
CHARPol server PLUQ : 0.00340104s (0.008869 cpu) [1]
CHARPol client CHECK: 0.000571012s (0.08921 cpu) [4]
CHARPol checked full: 0.0189021s (0.780353 cpu) [1]
89x89 charpoly verification successful
CHARPol server PLUQ : 0.000622034s (0.003978 cpu) [1]
CHARPol client CHECK: 0.000258923s (0.000254 cpu) [4]
CHARPol checked full: 0.00458193s (0.168126 cpu) [1]
FAIL test-charpoly-check (exit status: 138)

d-torrance avatar Jan 11 '18 18:01 d-torrance

Commit d8cd67d is likely to have fixed it.

ClementPernet avatar Jan 22 '18 21:01 ClementPernet

I renamed the issue since there's only one test on one architecture still failing with the Debian package of version 2.4.3, test-charpoly-check on sparc64.

From https://buildd.debian.org/status/fetch.php?pkg=fflas-ffpack&arch=sparc64&ver=2.4.3-1&stamp=1593511031&raw=0:

libtool: link: g++ -O2 -Wall -g -I.. -g -O2 -fdebug-prefix-map=/<<PKGBUILDDIR>>=. -fstack-protector-strong -Wformat -Werror=format-security -fabi-version=6 -fabi-version=6 -fopenmp -Wl,-z -Wl,relro -o test-charpoly-check test-charpoly-check.o -fopenmp  -lgivaro -lgmp -lgmpxx -lblas -llapack -fopenmp
l../build-aux/test-driver: line 107: 318084 Bus error               "$@" > $log_file 2>&1
FAIL: test-charpoly-check
FAIL: test-charpoly-check
=========================

CHARPol server PLUQ : 9.98974e-05s (0 cpu) [1]
CHARPol client CHECK: 0.000381947s (0 cpu) [4]
CHARPol checked full: 0.00181389s (0 cpu) [1]
10x10 charpoly verification successful
CHARPol server PLUQ : 0.00118399s (0 cpu) [1]
CHARPol client CHECK: 0.000505924s (0 cpu) [4]
CHARPol checked full: 0.018975s (0 cpu) [1]
74x74 charpoly verification successful
CHARPol server PLUQ : 0.000689983s (0 cpu) [1]
CHARPol client CHECK: 0.000377893s (0 cpu) [4]
CHARPol checked full: 0.0131581s (0 cpu) [1]
FAIL test-charpoly-check (exit status: 138)

d-torrance avatar Jul 04 '20 16:07 d-torrance

I also got this test failure in a PPA build of the master branch on s390x in Ubuntu 18.04:

make[5]: Entering directory '/<<PKGBUILDDIR>>/tests'
../build-aux/test-driver: line 107: 15724 Aborted                 (core dumped) "$@" > $log_file 2>&1
FAIL: test-charpoly-check

...

FAIL: test-charpoly-check
=========================

CHARPol server PLUQ : 0.00052619s (0 cpu) [1]
CHARPol client CHECK: 0.000102282s (0 cpu) [4]
CHARPol checked full: 0.00742912s (0.001392 cpu) [1]
83x83 charpoly verification successful
CHARPol server PLUQ : 0.000433922s (0 cpu) [1]
terminate called after throwing an instance of 'FFPACK::CharpolyFailed'
FAIL test-charpoly-check (exit status: 134)

d-torrance avatar Mar 08 '21 21:03 d-torrance

Just saw the same problem on x86_64 with version 2.5.0. Probably intermittent.

FAIL: test-charpoly-check
=========================

CHARPol server PLUQ : 7.79629e-05s (6.2e-05 cpu) [1]
CHARPol client CHECK: 0.000102997s (4.2e-05 cpu) [4]
CHARPol checked full: 0.00166917s (0.000712 cpu) [1]
28x28 charpoly verification successful
CHARPol server PLUQ : 0.000334024s (0.000334 cpu) [1]
terminate called after throwing an instance of 'FFPACK::CharpolyFailed'
FAIL test-charpoly-check (exit status: 134)

collares avatar Jan 10 '22 20:01 collares