kwrobel.eth

Results 59 comments of kwrobel.eth

@haileyschoelkopf Unfortunately, it does not work. Log probs can be returned only for generated tokens. Current implementation calls for every answer `questions: answer_i` with max_gen_tokens=0. My understanding of current openai...

@haileyschoelkopf The scores for bs=1 is 0.7033 and for bs=4 0.7111 (with stderr 0.01). Logprobs are different for bs=1 and bs=4: ![image](https://github.com/EleutherAI/lm-evaluation-harness/assets/1849959/befa690a-29bc-4ab9-b256-e1e70e43e825) Flash attention without compile causes error on my...

@LSinev 1. I expect that vllm will be also faster for loglikelihood tasks. transformers 4.39.1, vllm 0.3.2, this repo state is from yesterday cffc1bd3fd69453eaa75da891256682123226f0f 2. Nothing special. I have bolded...

You can replicate vllm loglikelihood slowness and different scores with e.g. `lm_eval --model hf --model_args "pretrained=mistralai/Mistral-7B-v0.1" --output_path "date/"`date +%s` --tasks belebele_pol_Latn --num_fewshot 0 --device cuda:0 --batch_size 1 --log_samples` hf bs=1...

Thank you! bs auto usually doesn't work and also this is the case: OOM vllm bs=auto OOM vllm bs=32 OOM vllm bs=16 01:31 0.3856

Thanks! vllm bs=auto max_model_len=4096 01:33 (+01:30 for `Processed prompts`?) 0.3856

Using bs auto with vllm is causing some extra time for "Processed prompts" - I don't know what it is but finally it is slower than bs=1. Remaining issues are:...

About different scores with different batch sizes. I have run evaluation with **max_len=1**, **2 examples** and bs 1 vs. 2. ``` lm_eval --model hf --model_args "pretrained=mistralai/Mistral-7B-v0.1,max_length=1" --output_path "date/"`date +%s` --tasks...