chiseltest icon indicating copy to clipboard operation
chiseltest copied to clipboard

TimeoutException should dump better debug information

Open edwardcwang opened this issue 5 years ago • 11 comments

Is there a way to catch TimeoutException in user code? It seems like the exception gets thrown and there is no way to catch it (user code is not even part of the stacktrace) and process it in user code (e.g. to reformat the message, provide extra context, etc).

- should connect ports and work *** FAILED ***
  chisel3.tester.TimeoutException: timeout on Clock(IO clock in MyModule) at 1000 idle cycles
  at chisel3.tester.TreadleBackend.$anonfun$run$5(TreadleBackend.scala:188)
  at chisel3.tester.TreadleBackend.$anonfun$run$5$adapted(TreadleBackend.scala:185)
  at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:145)
  at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:235)
  at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:228)
  at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
  at scala.collection.mutable.HashMap.foreach(HashMap.scala:145)
  at chisel3.tester.TreadleBackend.run(TreadleBackend.scala:185)
  at chisel3.tester.internal.Context$.$anonfun$run$1(Testers2.scala:25)
  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)

edwardcwang avatar Apr 30 '19 23:04 edwardcwang

If I recall correctly, the timeout is being thrown in the infrastructure code, which is a layer above the main test body (which I think is also being run in another thread). I think you can wrap the entire test (new MuDut()) { ... } construct in a try-catch, and it might work?

Otherwise, can you give me an example of the use case? I'm not sure how this is best handled especially given the multithreaded nature of tests. Currently, uncaught exceptions in any thread are propagated up to the main thread, then all other threads interrupted and terminated. If you wanted to catch the TimeoutException, in which thread should it fire? Just the main thread? Or is it fine to it have it as an infrastructure-level exception that can't be caught within the test body?

Also, the intent behind TimeoutException is to allow recovery from infinite loops without killing and restarting sbt - it's essentially a last-ditch save. We could also think of other mechanisms to support your use case.

ducky64 avatar Apr 30 '19 23:04 ducky64

The short-term problem which I ran into is the case of writing a test library. I wanted a way to present a friendlier message to the user about some more details of the test that failed, with the current stacktrace it's not immediately obvious where the fault came from. (i.e. yes, the test failed, but what was the test library doing that led to it?)

Code wise I wanted to do something like: (not super fixed on this API, just need some way to express this intent)

try {

fork {
  ...
} .fork {
  ...
} .join

} catch {
case e: TimeoutException: reportError("Your foo bar test failed")
}

edwardcwang avatar Apr 30 '19 23:04 edwardcwang

One idea for debugging was to allow named timescopes, which would be useful in localizing assertions (so you could get something like "assertion x != 1 failure in MyTransaction: SPI: bit 0: high", and not just a bunch of line numbers / stack trace). That might also provide more context for timeout failures.

That being said, this strategy certainly can't provide as much information as catch-rethrow, but it's also more lightweight. If you're already using timescopes, you just need to tack a name onto them, and you get assertion and timeout context essentially free. I think wrapping everything in try-catch is a bit cumbersome.

Anyways, I still don't think catching a TimeoutException is a good pattern (feel free to convince me otherwise?) - instead, if something is expected to finish within some time, that should be expressed with more intent-exposing constructs, eg expectWithin.

ducky64 avatar Apr 30 '19 23:04 ducky64

I'm fine with using named timescopes; then, is there a way to get the extra information out of it? It would make test libraries more pleasant to use if test libraries can help provide extra information. The main thing is to make the user aware of what the test library was doing that caused this.

val t = timescope(name="my_foo_bar", timeout=1000) {
fork { ...} .fork { ... } .join
}
if (t.hasExceptions) {
  // get exception and report it
}

edwardcwang avatar Apr 30 '19 23:04 edwardcwang

The idea is for the named timescope to provide information on what the library was doing (eg, SPI transaction, on bit 0, on the clock-high half-cycle). It wouldn't allow you to put in logic after-the-fact (eg, post-processing error messages) as you might when catching an exception, but you can have generated timescope names beforehand since the timescope name would take a string argument.

ducky64 avatar Apr 30 '19 23:04 ducky64

Is it possible to expand the named timescope idea to take a function or some user logic? e.g. timescope(reportingFunc=myReportingFunc, timeout=1000) { ... }

It could also be a trait, object, or something else. Which ever API is fine, as long as there's a relatively easy way for test library writers to add information beyond just a string.

edwardcwang avatar Apr 30 '19 23:04 edwardcwang

Also, if there's an experimental workaround where it's possible to just catch the TimeoutException directly, that's fine too. I understand developing a clean API takes time and thought.

edwardcwang avatar Apr 30 '19 23:04 edwardcwang

So in my view there's two orthogonal issues here:

  • Should TimeoutException be catchable in user test code? And which thread should get the exception if there are multiple threads? Note if all threads get the exception, and nothing in the thread catches it, you will get console spam indicating uncaught exception(s), which could just add to the confusion.
  • How should the timescope name stack (or other context-dependent testers2 data) be obtained for the purposes of reporting exceptions? Is this sufficiently beneficial to outweigh the cost of adding special-case constructs? One possibility that I think is relatively clean would be to have a special Testers2Exception (which could be extended by more specific exceptions) that captures the current context-specific testers2 data that is known, like timescope name stack. I'm not convinced adding a reporting function is the best solution, since it doesn't seem super composable - it seems like it's replacing a try/catch block without looking like a try/catch block.

ducky64 avatar May 01 '19 00:05 ducky64

Can we have TimeoutException as a subclass of Testers2Exception?

edwardcwang avatar May 01 '19 00:05 edwardcwang

Sure, but you still need to determine which thread gets the TimeoutException, or, if it stays an infrastructure-level exception not visible from user test code, but rather the code invoking the driver, which thread's context does it capture from.

ducky64 avatar May 01 '19 01:05 ducky64

Typing up resolutions from two weeks ago... When a TimeoutException fires, dump debugging info from all the threads. It should provide good debugging trace information (especially if we have waitFor as in #39), though its won't be catchable from within a test.

ducky64 avatar May 19 '19 01:05 ducky64