cats Sporadic test errors since upgrading to Native 0.5

Most recently https://github.com/typelevel/cats/actions/runs/10556403923/job/29242594012

  [error] Error: Total 12066, Failed 0, Errors 12, Passed 12054
  [error] Error during tests:
  [error] 	cats.tests.MapSuite
  [error] 	cats.tests.ReducibleSuiteAdditional
  [error] 	cats.tests.BoundedEnumerableSuite
  [error] 	cats.tests.TraverseListSuiteUnderlying
  [error] 	cats.tests.NonEmptyAlternativeSuite
  [error] 	cats.tests.FunctorSuite
  [error] 	cats.tests.SemigroupKSuite
  [error] 	cats.tests.FunctionKLiftSuite
  [error] 	cats.tests.KleisliSuite
  [error] 	cats.tests.TupleSuite
  [error] 	cats.tests.IorSuite
  [error] 	cats.tests.WriterSuite

But this has been haunting us since we upgraded.

Aug 27 '24 18:08 armanbilge

Alright. What if we divide tests into two groups for Native build?

Sep 01 '24 07:09 danicheg

@danicheg Unfortunately the problem is likely a bug with MUnit or Scala Native itself. MUnit was hastily upgraded to multithreading.

Sep 05 '24 16:09 armanbilge

I wonder – are there some other projects that suffer from the similar issue after upgrading to ScalaNative v5.x? I only see it for Cats for now, but I may not be aware of all the projects around.

Sep 05 '24 17:09 satorg

It seems that when some tests fail, they do not fail because of any particular error – they just do not start:

2024-11-14T09:18:52.7986145Z [error] Error: Total 13429, Failed 0, Errors 8, Passed 13421
2024-11-14T09:18:52.7987715Z [error] Error during tests:
2024-11-14T09:18:52.7988920Z [error] 	cats.tests.FoldableOneAndSuite
2024-11-14T09:18:52.7990196Z [error] 	cats.tests.FoldableListSuite
2024-11-14T09:18:52.7991508Z [error] 	cats.tests.ReducibleNonEmptyListSuite
2024-11-14T09:18:52.7992926Z [error] 	cats.tests.FunctionKLiftCrossBuildSuite
2024-11-14T09:18:52.7994481Z [error] 	cats.tests.AlgebraInvariantSuite
2024-11-14T09:18:52.7995762Z [error] 	cats.tests.BifoldableSuite
2024-11-14T09:18:52.7996962Z [error] 	cats.tests.PartialOrderSuite
2024-11-14T09:18:52.7998191Z [error] 	cats.tests.MonadErrorSuite

I see these errors in one of the last runs, but I cannot find any clue on what caused those failures. Those are just failed 🤷

Nov 14 '24 16:11 satorg

I'm not sure about how munit failed tests are reported, but when using JUnit in Scala Native project itself we got 2 categories:

Failed tests - basically failed assertions
Erronous tests - tests during execution of which a fatal error occoured, eg. segmenation fault

Based on the log above I think it touches the later category of fatal errors. It means we might need to investigate on the munit - scala native boundary

Dec 03 '24 20:12 WojciechMazur

Here's another recent one. https://github.com/typelevel/cats/actions/runs/12486161618/job/34845910566#step:14:16127

  [error] Error: Total 13010, Failed 0, Errors 19, Passed 12991
  [error] Error during tests:
  [error] 	cats.tests.EvalSuite
  [error] 	cats.tests.RepresentableStoreTSuite
  [error] 	cats.tests.TraverseFilterListSuite
  [error] 	cats.tests.DeprecatedEitherSuite
  [error] 	cats.tests.PartialOrderSuite
  [error] 	cats.tests.TraverseListSuiteUnderlying
  [error] 	cats.tests.FoldableVectorSuite
  [error] 	cats.tests.FoldableLazyListSuite
  [error] 	cats.tests.PartialFunctionSuite
  [error] 	cats.tests.ShowSuite2
  [error] 	cats.tests.AndThenSuite
  [error] 	cats.tests.CokleisliSuite
  [error] 	cats.tests.ParallelSuite
  [error] 	cats.tests.ApplicativeErrorSuite
  [error] 	cats.tests.DeprecatedNonEmptyListSuite
  [error] 	cats.tests.ArraySeqSuite
  [error] 	cats.tests.MonadErrorSuite
  [error] 	cats.tests.EitherKSuite
  [error] 	cats.tests.TraverseSuiteAdditional

Jan 03 '25 18:01 armanbilge

It seems that when some tests fail, they do not fail because of any particular error – they just do not start:

Yes, I believe what is happening is that there are a number of test runners executing in parallel. Each runner is assigned some subset of the total test suites. It seems like if a test runner encounters a fatal error (eg segfault) in one of its suites, it dies, causing all other suites assigned to that runner and not yet completed to also be considered "errored".

Jan 03 '25 18:01 armanbilge

@armanbilge , I wonder can those SEGFAULT errors be somehow related to tests that check for stack safety issues?

I mean, there are a plenty of tests in Cats that run quite memory-intense calculations just in order to make sure that there are no stack overflow errors. So I'm wondering – could such tests be actually the culprits?

Jan 03 '25 20:01 satorg

If those tests were failing, and they were stackoverflowing, then that could manifest as a segfault. But if the tests are using constant stack space (as they should) then it shouldn't be an issue. It's worth checking out :)

Jan 03 '25 20:01 armanbilge

That might be possible. Currently SN lacks proper handling for StackOverflowExceptions and OutOfMemoryError. The first one should be fixable by introducing canaries / signal handlers to recover from stack overflow. Similary we could try to handle OOM errors. I'll try to work on the prototype this weekend

Jan 03 '25 20:01 WojciechMazur