Feature Request: Integrate GPU Offloading for Prompt Processing

Open wallacelance opened this issue 1 year ago • 5 comments

We can connect a Linux/Windows machine equipped with a dedicated GPU (such as an RTX 5090) to the Mac using Thunderbolt 5. However, I'm uncertain whether it is possible to offload the prompt processing (compute-intensive tasks) to the machine with an NVIDIA GPU while using the Mac for token generation. Such capability would be extremely useful.

Prompt processing on Mac is a pain especially with larger context window. If we could offload the prompt processing (compute-intensive tasks) to a computer with NVIDIA GPU, it would save significant time and greatly enhance overall efficiency.

Mar 13 '25 03:03 wallacelance

Came here to request the same.

Mar 14 '25 17:03 jharmon96

Also requesting this

Mar 25 '25 00:03 frenzybiscuit

https://blog.exolabs.net/nvidia-dgx-spark/

Is this complete??

Dec 13 '25 20:12 jharmon96