kwrobel.eth
kwrobel.eth
Should it be merged?
@haileyschoelkopf Unfortunately, it does not work. Log probs can be returned only for generated tokens. Current implementation calls for every answer `questions: answer_i` with max_gen_tokens=0. My understanding of current openai...
It should help :)
@haileyschoelkopf The scores for bs=1 is 0.7033 and for bs=4 0.7111 (with stderr 0.01). Logprobs are different for bs=1 and bs=4:  Flash attention without compile causes error on my...
@LSinev 1. I expect that vllm will be also faster for loglikelihood tasks. transformers 4.39.1, vllm 0.3.2, this repo state is from yesterday cffc1bd3fd69453eaa75da891256682123226f0f 2. Nothing special. I have bolded...
You can replicate vllm loglikelihood slowness and different scores with e.g. `lm_eval --model hf --model_args "pretrained=mistralai/Mistral-7B-v0.1" --output_path "date/"`date +%s` --tasks belebele_pol_Latn --num_fewshot 0 --device cuda:0 --batch_size 1 --log_samples` hf bs=1...
Thank you! bs auto usually doesn't work and also this is the case: OOM vllm bs=auto OOM vllm bs=32 OOM vllm bs=16 01:31 0.3856
Thanks! vllm bs=auto max_model_len=4096 01:33 (+01:30 for `Processed prompts`?) 0.3856
Using bs auto with vllm is causing some extra time for "Processed prompts" - I don't know what it is but finally it is slower than bs=1. Remaining issues are:...
About different scores with different batch sizes. I have run evaluation with **max_len=1**, **2 examples** and bs 1 vs. 2. ``` lm_eval --model hf --model_args "pretrained=mistralai/Mistral-7B-v0.1,max_length=1" --output_path "date/"`date +%s` --tasks...