mlx-swift
mlx-swift copied to clipboard
error handling in mlx-swift
For discussion:
the mx::core (backend of mlx-swift) is written in C++ and uses C++ exceptions to signal errors. Swift has no way to catch these errors and is not exception safe -- that is if an exception were to be thrown and unwinds through stack frames implemented in Swift it would likely result in memory corruption, leaks or other poor behavior.
All (all!) mlx operations might throw. For example:
- creating a new MLXArray -- illegal shape, allocation errors
- loading weights from disk (
loadArrays()) -- I/O errors, file corruption, etc. - adding two MLXArrays (the
+operator) -- broadcast errors eval-- the prepared graph might have an eval-time error
Initially mlx-swift treated (most of) these C++ exceptions as programmer errors -- they were fatal errors that logged a message and exited the program. This is consistent with the way swift handles various programmer errors:
let a = 10
let b = 0
/// fatal error: divide by zero, exits
print(a / b)
let array = [Int]()
array.append(10)
/// fatal error: array index out of bounds, exits
print(array[20])
If the value of b or the index come from user input, it is up to the program to guard against these cases if they do not want to crash.
mlx-swift follows this same principle, but the failure cases are a little more complex, such as broadcasting or indexing. Additionally the program may load weights that are downloaded from the network and these may have unexpected shapes or dtypes (kind of the case of b above). For a command line tool this behavior might be acceptable: the program crashes and prints an error and you investigate the bug/bad data. For an application used by a user other than the programmer, this isn't great -- you don't want your application to crash because the weights on a model got updated.
Note: loadArrays() is marked as throws (and always has been), though initially it could not handle some types of errors that might occur.
In a recent release the withError construct was added to give a capability to catch the C++ exceptions:
- https://swiftpackageindex.com/ml-explore/mlx-swift/main/documentation/mlx/witherror(_:)-6g4wn
try withError {
let a = MLXArray(0 ..< 10, [2, 5])
let b = MLXArray(0 ..< 15, [3, 5])
// this will trigger a broadcast error
return a + b
}
It is sort of like an autoreleasepool for the C++ exceptions. To be clear: this does not cause execution to stop and return with an error when the exception occurs -- it collects the first exception in Task local state and when the block returns it will throw that if there was an exception thrown.
This is not really the Swift style however: Swift errors are always declared and you must try and either catch or declare than the function throws.
One option would be to declare all mlx operations as throws:
let a = try MLXArray(0 ..< 10, [2, 5])
let b = try MLXArray(0 ..< 15, [3, 5])
// this will trigger a broadcast error
let c = try a + b
Every call will require a try because they can all throw. The advantage is that it looks like Swift and is explicit. The disadvantage is that it doesn't look like MLX :-)
- Could we have two variants of every call?
- note: there is a performance cost to set up the handler to catch the potential error
- is there something better than
withError? - can we make this more Swifty somehow?
it can not catch the exception :libc++abi: terminating due to uncaught exception of type std::runtime_error: [METAL] Command buffer execution failed: Insufficient Permission (to submit GPU work from background) (00000006:kIOGPUCommandBufferCallbackErrorBackgroundExecutionNotPermitted)
Some more thoughts. Swift has 3 general mechanisms for handling errors:
Failable
A function or property returns an optional value. This is often used for domain errors or failures where the cause would be obvious. For example:
Int(String)returns an optional value -- it will be nil if the String can't be parsedYourEnum(rawValue: Int)returns an optional value -- it will be nil if the raw value is out of rangeDictionary[key]returns an optional value -- it will be nil if the key is not in the dictionary
The lack of error means that the caller can't know the details of why it failed, but aside from parsing a String -> Int, there aren't any details to be had.
These type of failures are expected to occur and are not programming errors (necessarily) and the system forces you to deal with them either via guard/let or if/let or ! (the latter turns it into a programming error if not satisfied).
Some older (bridged) APIs may also use an optional value where an error might have been more appropriate. CGImageSourceCreateWithURL, for example, returned NULL in C to indicate an error. The error might be that the file doesn't exist, couldn't be read, or the format was unknown -- there is no way to tell.
Note that some languages treat these as exceptions. Java will throw for the Int case and Python also has an exception for the Dictionary case.
Throws Error
Functions that have expected error conditions, like API dealing with files or networks, are typically marked with throws. Callers can decide what to do with these errors and there are mechanisms for getting information to present to the user. At the very least they can show an alert with an error message.
These are typically used with errors where you can't validate inputs without performing the operation (race condition or expense) or transient errors like network conditions. They are often used to indicate a problem with user input, like a file of unknown format.
Callers are forced to deal with these errors or declare that they defer to the caller.
Fatal Error
Errors that are the result of programming errors (bugs) are typically treated as fatal errors. For example Array[index] will produce a fatal error if the index is out of bounds (but remember that the equivalent on Dictionary is not!). Arithmetic overflow or integer divide by zero is another example.
Another class of errors that are typically failures are things like malloc failures or stack overflow -- generally things that a program should not attempt to recover from.
The MLX errors are a curious mix of these. Clearly reading a safetensors file should throw an Error if the file is unreadable, etc. (and it does).
Where does an array broadcast error fit? Is it like an out of bounds array index? Should it produce an optional? Should it throw? Currently it is modeled as a fatal error -- it is a programmer error to use broadcasting incorrectly. This is the case when the shapes are under program control.
What about loading weights for a model? Or when a user somehow supplies the MLXArray (and shape)? A developer could write all the checks to verify the compatibility of the shapes, but this is complex and there is no good way to do this without duplicating code that is inside the MLX core. The withError mechanism does allow you to call these methods to test the compatibility, but is that the correct mechanism?
I don't think putting throws / try on the every method and call is the correct approach -- every call can fail so it falls a little bit into the malloc failure case, but not a perfect match.
Looking for opinions here!