sean-jang00

Results 2 comments of sean-jang00

@Dead-Bytes By 'tried Gemma-2-27B ' do you mean that you performed QAT from scratch? How did you quantize the Gemma-2 models?

@dawnmsg Would training a 70B model from scratch with 1-bit precision require fewer resources than training with full precision? If similar resources are needed, would general developers still be able...