PaLM
PaLM copied to clipboard
Inference on CPU or MPS(Arm based Mac) ?
Is there any workaround for running inference on CPU or my arm based Mac M1. Currently trying to run on Mac m1 and I am getting the following error
/Users/pawandeepsingh/Documents/Development/llm/PaLM/inference.py:50 in main
❱ 50 │ model = torch.hub.load("conceptofmind/PaLM", args.model).to(device).to(dtype)
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False.
If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu')
to map your storages to the CPU.
Thanks.
I will have to convert it to cpp at some time in the near future.
Thanks for quick reply. Will be waiting and also will be looking to contribute to that. :)
Thanks for quick reply. Will be waiting and also will be looking to contribute to that. :)
You can map the model to the CPU as well by doing:
device = torch.device("cpu")
model = PaLM(
num_tokens=50304, dim=1024, depth=24, dim_head=128, heads=8, flash_attn=False, qk_rmsnorm = False,
).to(device).eval()
checkpoint = torch.load('./palm_410m_8k_v0.pt', map_location=device)
model.load_state_dict(checkpoint)
I still need to build the .cpp version but this should work for the meantime. I will put a note in the documentation.
Should the parameters in this script be changed for, say, the 1B version?