mlx icon indicating copy to clipboard operation
mlx copied to clipboard

when it crashed in the background, can not catch the exception

Open taocheng-njfu opened this issue 2 months ago • 12 comments

Image why it can not catch the exception below, how can I catch the exception when run in the background Image

taocheng-njfu avatar Sep 29 '25 09:09 taocheng-njfu

This is thrown on the metal worker thread -- it is possible that this is not set up to deliver this back to the calling thread (the one calling eval).

davidkoski avatar Sep 29 '25 15:09 davidkoski

I think this is the same as ml-explore/mlx-swift#274 -- the suggestion to try and catch the error didn't work because the error wasn't surfaced there.

davidkoski avatar Sep 29 '25 19:09 davidkoski

This is thrown on the metal worker thread -- it is possible that this is not set up to deliver this back to the calling thread (the one calling eval).

I think so too. Is there any way to catch this abnormality? I've been blocked for 3 or 4 days, and I don't have any ideas.

taocheng-njfu avatar Sep 30 '25 09:09 taocheng-njfu

I think this is the same as #274 -- the suggestion to try and catch the error didn't work because the error wasn't surfaced there.

Image Image

it can not catch the exception

taocheng-njfu avatar Sep 30 '25 09:09 taocheng-njfu

It would have to be a change on the mx::core side (mlx project).

davidkoski avatar Sep 30 '25 15:09 davidkoski

Let me see about transferring this issue.

davidkoski avatar Sep 30 '25 15:09 davidkoski

Specifically the request is:

  • if there is an uncaught exception on the worker thread, surface that or a proxy back in the eval

davidkoski avatar Sep 30 '25 15:09 davidkoski

Any updates, boss?

taocheng-njfu avatar Oct 13 '25 03:10 taocheng-njfu

@awni I think this requires a change on the mlx (core) side -- do you agree? Can you move it to the mlx repo? Thanks!

davidkoski avatar Oct 13 '25 15:10 davidkoski

Yea I can move it... but it's not likely we are going to add this feature in the near future. The reason is that we don't have guarantees on the state being in a reasonable condition if there is an exception during eval.

It's better to treat eval as something which shouldn't crash in your application. If it is crashing, then we should fix that (if it's an MLX issue) or you should fix it in the calling code if it's an issue there.

awni avatar Oct 13 '25 15:10 awni

I think this might be a race -- the GPU could become unavailable after eval is called, e.g. if an app goes into the background on iOS.

davidkoski avatar Oct 13 '25 16:10 davidkoski

My model is quite time-consuming when performing eval, probably taking about 5 seconds. If I switch to the background while eval is in progress, it crashes, and currently, I have no way to prevent it.

taocheng-njfu avatar Oct 14 '25 01:10 taocheng-njfu

Related: https://github.com/ml-explore/mlx/issues/2106, https://github.com/ml-explore/mlx/issues/1231, https://github.com/ml-explore/mlx/issues/1363 . We should probably merge them into one issue.

zcbenz avatar Nov 23 '25 00:11 zcbenz