Add GPU Split Strategy for Inference similar to LM Studio

Open deep1401 opened this issue 1 month ago • 0 comments

A user Frank talked to mentioned they primarily like the GPU splitting feature when doing inference with models in LM Studio. We should add that as well. This may involve doing work with transformerlab-inference. We could start with fastchat_server supporting this and then propagating it to other loader plugins.

Reference:

Nov 14 '25 20:11 deep1401