Sean Owen comments

Results 245 comments of


                                            Sean Owen

I want to use accelerate for multi-new card inference, but I don't know how to set some of the parameters

I don't think you necessarily have to set that. Are you seeing an issue? Otherwise you typically set this to the name of the repeated module in the transformer architecture....

OSError: [Errno 28] No space left on device

You ran out of disk space, that's all. Where did you save stuff? sometimes your local root volume is small, and most of the storage is in mounted EBS volumes.

Running Dolly (predict/inference) on a Mac / CPU

These models are far too large to run reasonably on CPUs, yes. You need an NVIDIA GPU, so this won't work on Macs. See https://github.com/databrickslabs/dolly/issues/67 for some attempts to get...

CUDA out of memory. Can this be run on a p3.16xlarge?

See guidance here: https://github.com/databrickslabs/dolly#v100-gpus-1 p3dn.24xlarge (32GB V100s) would be better. That's a 16GB V100. You may be able to make it work by configuring optimizer offload _and_ turning down batch...

CUDA out of memory. Can this be run on a p3.16xlarge?

16GB is small for training, yeah. You can try param offload too. But then it'll be slower. You want bigger GPUs - maybe g5 / A10?

CUDA out of memory. Can this be run on a p3.16xlarge?

Never seen that one - are there other errors? this is just saying "something went wrong". Make sure you made all the settings in the notebook, and suggest configurations. It...

CUDA out of memory. Can this be run on a p3.16xlarge?

OOM on the GPU or VM? You are following https://github.com/databrickslabs/dolly#a10-gpus-1 right? That instance was enough IIRC to train 7B, or at least it started to. If it fails late in...

CUDA out of memory. Can this be run on a p3.16xlarge?

Oh, you are trying to train on a g5.4xlarge? I misread, crossed it with the OP's thread. That's too little mem. You want a multi-GPU setup with more mem like...

CUDA out of memory. Can this be run on a p3.16xlarge?

Yep, doesn't look right. Not sure how you have set up the instance, so pretty hard to debug. Probably mismatched versions of CUDA libraries.

CUDA out of memory. Can this be run on a p3.16xlarge?

I think the problem is very long inputs. You can filter them or use a smaller batch size, indeed. I'll have to change guidance if 3 isn't working. yeah you...