cats icon indicating copy to clipboard operation
cats copied to clipboard

Sporadic test errors since upgrading to Native 0.5

Open armanbilge opened this issue 1 year ago • 10 comments

Most recently https://github.com/typelevel/cats/actions/runs/10556403923/job/29242594012

  [error] Error: Total 12066, Failed 0, Errors 12, Passed 12054
  [error] Error during tests:
  [error] 	cats.tests.MapSuite
  [error] 	cats.tests.ReducibleSuiteAdditional
  [error] 	cats.tests.BoundedEnumerableSuite
  [error] 	cats.tests.TraverseListSuiteUnderlying
  [error] 	cats.tests.NonEmptyAlternativeSuite
  [error] 	cats.tests.FunctorSuite
  [error] 	cats.tests.SemigroupKSuite
  [error] 	cats.tests.FunctionKLiftSuite
  [error] 	cats.tests.KleisliSuite
  [error] 	cats.tests.TupleSuite
  [error] 	cats.tests.IorSuite
  [error] 	cats.tests.WriterSuite

But this has been haunting us since we upgraded.

armanbilge avatar Aug 27 '24 18:08 armanbilge

Alright. What if we divide tests into two groups for Native build?

danicheg avatar Sep 01 '24 07:09 danicheg

@danicheg Unfortunately the problem is likely a bug with MUnit or Scala Native itself. MUnit was hastily upgraded to multithreading.

armanbilge avatar Sep 05 '24 16:09 armanbilge

I wonder – are there some other projects that suffer from the similar issue after upgrading to ScalaNative v5.x? I only see it for Cats for now, but I may not be aware of all the projects around.

satorg avatar Sep 05 '24 17:09 satorg

It seems that when some tests fail, they do not fail because of any particular error – they just do not start:

2024-11-14T09:18:52.7986145Z [error] Error: Total 13429, Failed 0, Errors 8, Passed 13421
2024-11-14T09:18:52.7987715Z [error] Error during tests:
2024-11-14T09:18:52.7988920Z [error] 	cats.tests.FoldableOneAndSuite
2024-11-14T09:18:52.7990196Z [error] 	cats.tests.FoldableListSuite
2024-11-14T09:18:52.7991508Z [error] 	cats.tests.ReducibleNonEmptyListSuite
2024-11-14T09:18:52.7992926Z [error] 	cats.tests.FunctionKLiftCrossBuildSuite
2024-11-14T09:18:52.7994481Z [error] 	cats.tests.AlgebraInvariantSuite
2024-11-14T09:18:52.7995762Z [error] 	cats.tests.BifoldableSuite
2024-11-14T09:18:52.7996962Z [error] 	cats.tests.PartialOrderSuite
2024-11-14T09:18:52.7998191Z [error] 	cats.tests.MonadErrorSuite

I see these errors in one of the last runs, but I cannot find any clue on what caused those failures. Those are just failed 🤷

satorg avatar Nov 14 '24 16:11 satorg

I'm not sure about how munit failed tests are reported, but when using JUnit in Scala Native project itself we got 2 categories:

  • Failed tests - basically failed assertions
  • Erronous tests - tests during execution of which a fatal error occoured, eg. segmenation fault

Based on the log above I think it touches the later category of fatal errors. It means we might need to investigate on the munit - scala native boundary

WojciechMazur avatar Dec 03 '24 20:12 WojciechMazur

Here's another recent one. https://github.com/typelevel/cats/actions/runs/12486161618/job/34845910566#step:14:16127

  [error] Error: Total 13010, Failed 0, Errors 19, Passed 12991
  [error] Error during tests:
  [error] 	cats.tests.EvalSuite
  [error] 	cats.tests.RepresentableStoreTSuite
  [error] 	cats.tests.TraverseFilterListSuite
  [error] 	cats.tests.DeprecatedEitherSuite
  [error] 	cats.tests.PartialOrderSuite
  [error] 	cats.tests.TraverseListSuiteUnderlying
  [error] 	cats.tests.FoldableVectorSuite
  [error] 	cats.tests.FoldableLazyListSuite
  [error] 	cats.tests.PartialFunctionSuite
  [error] 	cats.tests.ShowSuite2
  [error] 	cats.tests.AndThenSuite
  [error] 	cats.tests.CokleisliSuite
  [error] 	cats.tests.ParallelSuite
  [error] 	cats.tests.ApplicativeErrorSuite
  [error] 	cats.tests.DeprecatedNonEmptyListSuite
  [error] 	cats.tests.ArraySeqSuite
  [error] 	cats.tests.MonadErrorSuite
  [error] 	cats.tests.EitherKSuite
  [error] 	cats.tests.TraverseSuiteAdditional

armanbilge avatar Jan 03 '25 18:01 armanbilge

It seems that when some tests fail, they do not fail because of any particular error – they just do not start:

Yes, I believe what is happening is that there are a number of test runners executing in parallel. Each runner is assigned some subset of the total test suites. It seems like if a test runner encounters a fatal error (eg segfault) in one of its suites, it dies, causing all other suites assigned to that runner and not yet completed to also be considered "errored".

armanbilge avatar Jan 03 '25 18:01 armanbilge

@armanbilge , I wonder can those SEGFAULT errors be somehow related to tests that check for stack safety issues?

I mean, there are a plenty of tests in Cats that run quite memory-intense calculations just in order to make sure that there are no stack overflow errors. So I'm wondering – could such tests be actually the culprits?

satorg avatar Jan 03 '25 20:01 satorg

If those tests were failing, and they were stackoverflowing, then that could manifest as a segfault. But if the tests are using constant stack space (as they should) then it shouldn't be an issue. It's worth checking out :)

armanbilge avatar Jan 03 '25 20:01 armanbilge

That might be possible. Currently SN lacks proper handling for StackOverflowExceptions and OutOfMemoryError. The first one should be fixable by introducing canaries / signal handlers to recover from stack overflow. Similary we could try to handle OOM errors. I'll try to work on the prototype this weekend

WojciechMazur avatar Jan 03 '25 20:01 WojciechMazur