PaLM icon indicating copy to clipboard operation
PaLM copied to clipboard

Inference on CPU or MPS(Arm based Mac) ?

Open Pawandeep-prog opened this issue 2 years ago • 4 comments

Is there any workaround for running inference on CPU or my arm based Mac M1. Currently trying to run on Mac m1 and I am getting the following error

 /Users/pawandeepsingh/Documents/Development/llm/PaLM/inference.py:50 in main 
 ❱ 50 │   model = torch.hub.load("conceptofmind/PaLM", args.model).to(device).to(dtype)  

RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False.
If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') 
to map your storages to the CPU.

Thanks.

Pawandeep-prog avatar May 12 '23 03:05 Pawandeep-prog

I will have to convert it to cpp at some time in the near future.

conceptofmind avatar May 12 '23 03:05 conceptofmind

Thanks for quick reply. Will be waiting and also will be looking to contribute to that. :)

Pawandeep-prog avatar May 12 '23 03:05 Pawandeep-prog

Thanks for quick reply. Will be waiting and also will be looking to contribute to that. :)

You can map the model to the CPU as well by doing:

    device = torch.device("cpu")

    model = PaLM(
        num_tokens=50304, dim=1024, depth=24, dim_head=128, heads=8, flash_attn=False, qk_rmsnorm = False,
    ).to(device).eval()

    checkpoint = torch.load('./palm_410m_8k_v0.pt', map_location=device)
    model.load_state_dict(checkpoint)

I still need to build the .cpp version but this should work for the meantime. I will put a note in the documentation.

conceptofmind avatar May 16 '23 22:05 conceptofmind

Should the parameters in this script be changed for, say, the 1B version?

tomsib2001 avatar Sep 14 '23 23:09 tomsib2001