libmesh icon indicating copy to clipboard operation
libmesh copied to clipboard

Trigger a build

Open loganharbour opened this issue 4 years ago • 20 comments

Just switched libmesh to use new cloning methods, which sandbox things a bit more. Testing the changes.

loganharbour avatar Jul 21 '21 00:07 loganharbour

@roystgnr - let's chat about this tomorrow, please. These changes shouldn't alter any testing behavior except on mac. With this, I think that all of the failures (spare for MOOSE Mac Test) aren't my doing. Just wanted to make sure.

loganharbour avatar Jul 21 '21 00:07 loganharbour

Well that's a nice grab bag of CI regressions.

ERROR: Unable to locate a modulefile for 'cppunit' in a Fetch and Branch is definitely not my fault. Going out on a limb I'd be willing to bet that the assertion in Moose at PhiZeroKernel.C, line 43 isn't a libMesh thing either.

We arbitrarily tossed 1e-11 into a unit test tolerance at some point, and now we're seeing 1.042e-11 instead on some builds? I'm not happy about that but I'm not too surprised or dismayed. We can bump up the test tolerance.

I'm kind of baffled by the fparser_ad.cc build failures. We're using weird linking options in that one but we're not even at the link stage! Or do the static-only linkage options make it impossible for us to do JIT stuff, and we have some macros under the hood reflecting that?

The GhostPointNeighbors code breaking NoAMR builds was probably my mistake, but regardless should be easy to fix.

roystgnr avatar Jul 21 '21 13:07 roystgnr

Yeah - I said that the mac failure is definitely my fault. As long as you agree that I haven't caused the rest, I'm good :)

Might not be a bad idea to start testing these more... that's up to you.

loganharbour avatar Jul 21 '21 19:07 loganharbour

I'm kind of baffled by the fparser_ad.cc build failures. We're using weird linking options in that one but we're not even at the link stage! Or do the static-only linkage options make it impossible for us to do JIT stuff, and we have some macros under the hood reflecting that?

And for this one... I have no clue. @lindsayad @dschwen ?

loganharbour avatar Jul 21 '21 19:07 loganharbour

Might not be a bad idea to start testing these more... that's up to you.

The failure to test these more was on me - years ago we had to repurpose our old BuildBot box at UT, and that was what had been doing regular builds of our weirder configurations.

I definitely don't want to test these on every new PR, but how hard would it be to make moosebuild run them on master every week or so, and spew out emails or autogenerate an issue or something when we see a failure?

roystgnr avatar Jul 21 '21 20:07 roystgnr

fparser is a black box to me

lindsayad avatar Jul 21 '21 21:07 lindsayad

fparser JIT requires dlopen. From dlopen.m4, we disable dlopen when someone configures with --enable-all-static

  AS_IF([test "x$enableallstatic" = "xyes"], [ac_cv_cxx_dlopen=no])

but not, as far as I know, when someone configures with --disable-shared --enable-static which only causes the libmesh libraries to be built statically.

jwpeterson avatar Jul 21 '21 21:07 jwpeterson

I'd say the compiler errors in the "all-static" config here are probably due to there being some JIT code that is not properly guarded by #ifdef LIBMESH_HAVE_FPARSER_JIT.

jwpeterson avatar Jul 21 '21 21:07 jwpeterson

I can fix the LIBMESH_HAVE_FPARSER_JIT problems

dschwen avatar Jul 22 '21 20:07 dschwen

I can reproduce the failure at 13 processors in Parallel sweep. In dbg mode I get:

fvkernels/fv_adapt.adapt: No index 133 in ghosted vector.
fvkernels/fv_adapt.adapt: Vector contains [162,181)
fvkernels/fv_adapt.adapt: And ghost array {200,199,196,195,194,191,190,186,185,184,161,160,155,154,150,149,148,147,202,143,198,139,197,138,189,71,188,70,187,69,183,65,64,181,63,59,58,54,53,49,48,47,158,40,157,39,38,152,34,151,33,146,28,27,201,142,24,192,15}

Stack trace says that lookup is coming from MooseVariableDataFV<double>::fetchDoFValues, line 1141 - I'm not sure what that's looking up, though, or why we wouldn't have it ghosted.

roystgnr avatar Jul 23 '21 21:07 roystgnr

so what did change here?

lindsayad avatar Jul 26 '21 15:07 lindsayad

No way to be sure without bisecting. That parallel sweep test only gets run manually. Last run before this one was 5 months ago: https://civet.inl.gov/recipe_events/32443/

roystgnr avatar Jul 26 '21 16:07 roystgnr

Hah, and fv_adapt.adapt failed then, too...

roystgnr avatar Jul 26 '21 16:07 roystgnr

Previous run was 15 months ago, at which time fv_adapt tests didn't exist. So it's entirely possible that that test has never succeeded with 13 processors.

roystgnr avatar Jul 26 '21 16:07 roystgnr

Job Parallel sweep on ad8df18 : invalidated by @lindsayad

Let''s see if we fixed it!

moosebuild avatar Jul 28 '21 00:07 moosebuild

Job Coverage on 6642830 wanted to post the following:

Coverage

Coverage did not change

Full coverage report

This comment will be updated on new commits.

moosebuild avatar Feb 16 '22 22:02 moosebuild

#3169 should have fixed a couple --disable-foo builds, but the distributed stuff is more of a problem; turns out my new triangulator code isn't quite as distributed-mesh-agnostic as I had intended.

roystgnr avatar Feb 22 '22 20:02 roystgnr

Job Distributed make check sweep (debug, even) on 6642830 : invalidated by @roystgnr

Kicking to see if the -np 4 failure is repeatable

moosebuild avatar Mar 01 '22 15:03 moosebuild

Kicking to see if the -np 4 failure is repeatable

It is not...

roystgnr avatar Mar 01 '22 20:03 roystgnr

This PR has been marked "do not merge" since we are no longer accepting PRs into the master branch. All new PRs should be made on the devel branch instead. Once this PR's target branch has been updated to devel, the "do not merge" label will be removed.

jwpeterson avatar Mar 02 '22 20:03 jwpeterson