sean-jang00
Results
2
comments of
sean-jang00
@Dead-Bytes By 'tried Gemma-2-27B ' do you mean that you performed QAT from scratch? How did you quantize the Gemma-2 models?
@dawnmsg Would training a 70B model from scratch with 1-bit precision require fewer resources than training with full precision? If similar resources are needed, would general developers still be able...