Running Dolly (predict/inference) on a Mac / CPU

Open nardi-yuval opened this issue 2 years ago • 1 comments

Hi,

Why using Dolly on a Mac is EXTREMELY slow? Isn't it just calling, conceptually, a "predict" func? Do you have to have GPU for reasonable runtime?
What would be the best way if someone wants to develop a langchain chain and use Dolly within it, but wants also to be able to debug the code?

Thanks, Yuval

Apr 28 '23 19:04 nardi-yuval

These models are far too large to run reasonably on CPUs, yes. You need an NVIDIA GPU, so this won't work on Macs. See https://github.com/databrickslabs/dolly/issues/67 for some attempts to get to run via MPS, but doesn't sound like it works.

Dolly is not code, it's a model. You can use langchain with it; there are examples in this repo. Debugging Python code, you can do, but just isn't related to this model. You can set verbose=True on langchain chains, however, to get them to show you what they are doing with an LLM, which often helps 'debug' in that sense.

Apr 29 '23 14:04 srowen