Iambestfeed

Results 15 comments of Iambestfeed

> For QA datasets, we use query as , and use answer/context as . We use the candidate (except ground truth) provided by the original dataset as .`query``pos``neg` > >...

> Hello! Hard to tell. Can you please send the logs of the run? What was the error message? I don't keep a log, but what I see is that...

> Are you running on Linux ? Try to increase the swap size, most times linux just print "Killed" if running out of CPU memory Hmm, looks like the problem...

> I haven't tested a 3b model, or anything OpenLlama for that matter. Would you mind sharing the quantized model on HF? I can give it a test and see...

@turboderp This is the message that is output when i use --eval ```python (base) dungnt@symato:~/ext_hdd/repos/Nhan/GPTQ-for-LLaMa$ CUDA_VISIBLE_DEVICES=0 python llama.py /home/dungnt/ext_hdd/repos/Nhan/GPTQ-for-LLaMa/checkpoints/open_llama_3b c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors llama3b-4bit-128g.safetensors --eval [2023-07-02...

> Okay, I found another bug, specifically affecting 3b act-order models. With the latest commit I get ppl = 7.86, and I'm going to write off the difference as this...

> @Iambestfeed How did you quantize the 3B model specifically? > I tried GPTQ for LlaMa (no 3B option when running llama.py), AutoGPTQ seems to work, but I wanted to...

@intfloat I'm looking at quantization algorithms like AWQ, GPTQ and they seem to work on minimizing loss based on the model's output (so I'm hoping to have ``lm_head`` so I...

@intfloat Hmm, finetuning with MSE loss sounds like a practical idea. Do you think I should implement with fp16 model as a teacher and 4-bit model as a student model?

@intfloat I actually don't have many GPUs to do the training experiments. Do you have any other tips for optimizing GPU memory usage and speeding up inference? (I hope they...