swift-apis
swift-apis copied to clipboard
Session Crashes in Colab & Jupyter Lab
I have noticed that whenever the input to the function doesn't satisfy the precondition then it result in crashing of a Google Colab and Swift-Jupyter Lab. Here is the reference notebook for the same
This is a fundamental problem with how swift-jupyter is implemented, and we don't have any short term solutions to the problem. I'll keep this open and add an "open-design-questions" label because it would be nice to solve this problem eventually.
What is happening is that these precondition failures crash the program, putting it in an undefined state. LLDB (the thing that actually powers swift-jupyter's swift compilation and execution) tries to recover, by resetting the program to a reasonable state, but it doesn't alway succeed.
We brainstormed some ideas for fixing this, but none of the good ones will be easy to do any time soon:
- Make the Tensor APIs return garbage (e.g.
Tensor(0)
) when there is an error, instead of crashing the program. This would be easy to implement, but it's not a good to return results that look successful when there was an error. - Turn the
Tensor
type into an enum with an error case, and return that on errors. This might be a good solution but it's a pretty significant API change and will have lots of consequences to deal with. - Change all the Tensor APIs to throwing functions, so that Swift's native error handling mechanism can deal with errors. This is probably bad, because we don't want to force users to write
try
in front of all their tensor operations. - Add an exception mechanism to Swift that can properly unwind the stack on errors. This is a nice solution, but it's lots of work and it's unlikely that Swift would accept this feature, because Swift already has its own non-exception-based error handling mechanism.
Related discussion: https://forums.swift.org/t/force-unwrapping-try-and-fatalerror-in-the-lldb-repl-cause-memory-leaks/20823
I made a post on the SIG a couple of weeks ago: https://groups.google.com/a/tensorflow.org/forum/m/#!topic/swift/uACgFQJqzEI. These issues might be related.