lmdeploy
lmdeploy copied to clipboard
[Feature] throw Turbomind error to python
Motivation
Cannot catch and continue running when Turbomind throws an error.
Related resources
No response
Additional context
No response
Hi @lijing1996 You may provide detailed information about the error reported, how it was triggered, and provide a minimal reproducible example.
When the TurboMind Engine reports an error, it is usually divided into two situations. One is an unrecoverable error, such as OOM, just let it crash and the other is an error that only affects a specific request at present, in which case letting that request fail and having the client retry will suffice.
Hi @lijing1996 You may provide detailed information about the error reported, how it was triggered, and provide a minimal reproducible example.
When the TurboMind Engine reports an error, it is usually divided into two situations. One is an unrecoverable error, such as OOM, just let it crash and the other is an error that only affects a specific request at present, in which case letting that request fail and having the client retry will suffice.
The first case. In such a case, could it catch the error and then re-import and re-load the model? I found it was sometimes OOM in my case with a large batch size. However, with a small batch size, the speed was low.
In such a case, could it catch the error and then re-import and re-load the model?
In this situation, catching the error is meaningless as it is a fatal error. It should just be allowed to crash to expose the problem. Also, I believe this is a bug that should be fixed. Could you provide detailed steps for reproducing it, including the model, request parameters, specific request content, etc.? As a program running on the server side for a long time, stability is very important, especially for Internet services.
In such a case, could it catch the error and then re-import and re-load the model?
In this situation, catching the error is meaningless as it is a fatal error. It should just be allowed to crash to expose the problem. Also, I believe this is a bug that should be fixed. Could you provide detailed steps for reproducing it, including the model, request parameters, specific request content, etc.? As a program running on the server side for a long time, stability is very important, especially for Internet services.
It is just a OOM error. I use VLM to caption lots of images, so I need re-start after crash.