Charles Srisuwananukorn
Charles Srisuwananukorn
For inference, I saw that some folks on Discord were able to run on multiple cards [in this thread](https://discordapp.com/channels/1082503318624022589/1082510608123056158/1084210191635058759). I haven't had a chance to try it myself. For the...
Of course! https://discord.gg/9Rk6sSeWEG
I'm re-purposing this issue to track adding multi-GPU inference documentation to the repo.
@zhangce, any updates on this?
Thanks for the patch, @juncongmoo! Unfortunately, the `from_raw_prompt` method that we released was only half implemented. It needs to create an instance of `Conversation`, passing a human id and a...
Sorry, the README is out of date. I'll delete the README.
You'll need more than 40GB of VRAM to run the model. An 80GB A100 is definitely enough. A 48GB A40 might work, but that might be cutting it a little...
Thank you for the detailed bug report. Let me try to reproduce this.
It looks like NVIDIA's [nccl](https://github.com/NVIDIA/nccl) only supports Linux, I'm sorry. I don't see any packages built for Windows on conda-forge.
Thanks for responding, @davismartens!