kwrobel.eth
kwrobel.eth
I have found exact place: https://github.com/EleutherAI/lm-evaluation-harness/blob/e9d429e105fa95dd4a1b5606b306289d207fcf62/lm_eval/models/huggingface.py#L1049 and replicated with minimal code (I get the same numbers in this line). Model loaded on CPU with bfloat16 gives the same numbers: ```...
Why do you think it is a problem with model implementation? But yes, it is not related to lm-evaluation-harness repository. Maybe it is some GPU optimization (cuBLAS?).
The same issue with `meta-llama/Llama-2-7b-chat-hf`. Maybe it is resolved in new cuBLAS: https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#cublas-release-12-3-update-1 ? I am using CUDA 12.1, cuBLAS 12.1.3.1
Thank you! It would be helpful to support it.
What do you mean? It is still not implemented.
@abidlabs Thanks. But it doesn't work correctly. After clicking an example button different text is provided. You can check here: https://huggingface.co/spaces/speakleash/Bielik-7B-Instruct-v0.1 
It would be very helpful to support mermaid.
speakleash/Bielik-7B-Instruct-v0.1 supports system prompt, so the problem must be with data: "Conversation roles must alternate user/assistant/user/assistant/..." You can't apply chat template for `conversation=[{"role": "assistant", "content": r"%%%%%%%%%%%%%%%%"}],` because before assistant role...
@Carbon225 Do you think this is ready?
I don't understand what it is about. However, now caching is not working with `openai-completions`.