Feature Request: Integrate GPU Offloading for Prompt Processing
We can connect a Linux/Windows machine equipped with a dedicated GPU (such as an RTX 5090) to the Mac using Thunderbolt 5. However, I'm uncertain whether it is possible to offload the prompt processing (compute-intensive tasks) to the machine with an NVIDIA GPU while using the Mac for token generation. Such capability would be extremely useful.
Prompt processing on Mac is a pain especially with larger context window. If we could offload the prompt processing (compute-intensive tasks) to a computer with NVIDIA GPU, it would save significant time and greatly enhance overall efficiency.
Came here to request the same.
Also requesting this
https://blog.exolabs.net/nvidia-dgx-spark/
Is this complete??