lmdeploy [Feature] throw Turbomind error to python

Motivation

Cannot catch and continue running when Turbomind throws an error.

Related resources

No response

Additional context

No response

May 01 '24 10:05 lijing1996

Hi @lijing1996 You may provide detailed information about the error reported, how it was triggered, and provide a minimal reproducible example.

When the TurboMind Engine reports an error, it is usually divided into two situations. One is an unrecoverable error, such as OOM, just let it crash and the other is an error that only affects a specific request at present, in which case letting that request fail and having the client retry will suffice.

May 05 '24 09:05 zhyncs

Hi @lijing1996 You may provide detailed information about the error reported, how it was triggered, and provide a minimal reproducible example.

When the TurboMind Engine reports an error, it is usually divided into two situations. One is an unrecoverable error, such as OOM, just let it crash and the other is an error that only affects a specific request at present, in which case letting that request fail and having the client retry will suffice.

The first case. In such a case, could it catch the error and then re-import and re-load the model? I found it was sometimes OOM in my case with a large batch size. However, with a small batch size, the speed was low.

May 06 '24 00:05 lijing1996

In such a case, could it catch the error and then re-import and re-load the model?

In this situation, catching the error is meaningless as it is a fatal error. It should just be allowed to crash to expose the problem. Also, I believe this is a bug that should be fixed. Could you provide detailed steps for reproducing it, including the model, request parameters, specific request content, etc.? As a program running on the server side for a long time, stability is very important, especially for Internet services.

May 06 '24 03:05 zhyncs

In such a case, could it catch the error and then re-import and re-load the model?

In this situation, catching the error is meaningless as it is a fatal error. It should just be allowed to crash to expose the problem. Also, I believe this is a bug that should be fixed. Could you provide detailed steps for reproducing it, including the model, request parameters, specific request content, etc.? As a program running on the server side for a long time, stability is very important, especially for Internet services.

It is just a OOM error. I use VLM to caption lots of images, so I need re-start after crash.

May 06 '24 07:05 lijing1996

lmdeploy lmdeploy copied to clipboard

[Feature] throw Turbomind error to python

Motivation

Related resources

Additional context

lmdeploy
lmdeploy copied to clipboard