transformerlab-app
transformerlab-app copied to clipboard
Add GPU Split Strategy for Inference similar to LM Studio
A user Frank talked to mentioned they primarily like the GPU splitting feature when doing inference with models in LM Studio.
We should add that as well. This may involve doing work with transformerlab-inference. We could start with fastchat_server supporting this and then propagating it to other loader plugins.
Reference: