llama
llama copied to clipboard
How run 30B on 4 GPUs interactively
It's work on predefined prompts, how to change it to chat mode like chatgpt, I use:
if local_rank == 0:
prompts = input("User:")
It doesn`t work.
I make a fork that supports interactive job across multiple machines: https://github.com/LambdaLabsML/llama/blob/b8cb25d01d0563bba12f265649079061f3ed753e/interactive.py#L122-L137
Closing as we released Llama 2 chat. Feel free to re-open as needed.