stanford_alpaca
stanford_alpaca copied to clipboard
Errors executing torchrun for train.py on Apple Silicon M2 Pro
When attempting to do fine tuning, I'm getting the following error in the output:
RuntimeError: Distributed package doesn't have NCCL built in
Searching here indicates this is related to CUDA and other NVIDIA GPU related rendering.
So, I added the following snippet to train.py, which is supposed to force CPU only (same workaround used by this user in another meta-related repo: https://github.com/markasoftware/llama-cpu): torch.distributed.init_process_group("gloo")
Now the NCCL error goes away, but I get this error instead:
AttributeError: module 'torch._C' has no attribute '_cuda_setDevice'
Again, I have no NVIDIA GPU or other such software on my system. I've tried a myriad of workarounds for my apple silicon, but haven't gotten very far.
Anything I'm missing here?
I'm running on python 3.10.9 with all requirements.txt entries installed as required via pip.