Dan Saattrup Smart comments

Results 240 comments of


                                            Dan Saattrup Smart

Detect if model has been trained on the benchmarks

I like the idea of testing what the chance is for a model to have "cheated" on a benchmark. However, the method in the paper that you link to requires,...

[MODEL EVALUATION REQUEST] stabilityai/stablelm-2-12b

Bug! #797

[FEATURE REQUEST] Gemeni benchmarks

Makes sense! In that case we probably need to create a `gemini_models` module, analogous to `openai_models`, which has classes `GeminiTokenizer` and `GeminiModel`, as well as a `model_setups.gemini` module analogous to...

[FEATURE REQUEST] Gemeni benchmarks

I'm not familiar with Llamaindex, but evaluating the closed models currently involve more than "merely" generating sequences. JSON mode is heavily used for NER tasks, and we use the logits...

[MODEL EVALUATION REQUEST] NorwAI/NorwAI-Mixtral-8x7B-instruct

@Mikeriess This one yep (and in general Mixtral-type models) - thanks 🙂

[MODEL EVALUATION REQUEST] NorwAI/NorwAI-Mixtral-8x7B-instruct

> Allright - this model is gated, so I'll need to use my access token. What is the name of the argument to add here? (couldnt find it in the...

[MODEL EVALUATION REQUEST] NorwAI/NorwAI-Mixtral-8x7B-instruct

> Getting a `[rank0]: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 134.00 MiB. GPU `. I assume 96GB VRAM isnt enough for this model :-) That's exactly the reason...

[MODEL EVALUATION REQUEST] NorwAI/NorwAI-Mixtral-8x7B-instruct

This is live on the leaderboards now, thanks to @Mikeriess! 🎉

RuntimeError on ROCm

This seems similar-ish to [this issue](https://github.com/ROCm/ROCm/issues/2536#issuecomment-1755682831). Can you see if any of these, or combinations of them, work? > export PYTORCH_ROCM_ARCH="gfx1031" > export HSA_OVERRIDE_GFX_VERSION=10.3.1 > export HIP_VISIBLE_DEVICES=0 > export ROCM_PATH=/opt/rocm

RuntimeError on ROCm

Progress! What happens if you set `AMD_SERIALIZE_KERNEL=3`? Maybe we'll get a more informative error.