minimal-llama PEFT + PP bug?

PEFT + PP bug?

Open hexafluoride opened this issue 1 year ago • 0 comments

Hello, could you please elaborate on what "Seems buggy, don't use this yet." means for the 8-bit + pipeline parallel example? What bug is there specifically? Does it affect training results or is it a tooling issue? I've been waiting to be able to fine tune the 65B model for a while now and if there's anything I can do with testing or fixing this bug, I'd love some pointers. Thanks!

Mar 17 '23 18:03 hexafluoride

minimal-llama minimal-llama copied to clipboard

PEFT + PP bug?

minimal-llama
minimal-llama copied to clipboard