minimal-llama
minimal-llama copied to clipboard
PEFT + PP bug?
Hello, could you please elaborate on what "Seems buggy, don't use this yet." means for the 8-bit + pipeline parallel example? What bug is there specifically? Does it affect training results or is it a tooling issue? I've been waiting to be able to fine tune the 65B model for a while now and if there's anything I can do with testing or fixing this bug, I'd love some pointers. Thanks!