Nicholas Carlini
Nicholas Carlini
This is an expected behavior due to how I implement the 2-ply search. It's more of 1.75-ply search: the second ply is evaluated as part of the "is my current...
Ah very nice. I should write some code that will merge together multiple independent datasets to make a larger matrix... I guess we don't know what Mistral Medium is, but...
You'll need to write an interface that connects this project up to Azure. You can see this PR for someone who added a new endpoint (#17), or reference some of...
Yeah maybe this isn't a bad idea... I did update my initial blog post with claude-3 and mistral large (https://nicholas.carlini.com/writing/2024/my-benchmark-for-large-language-models.html) But maybe having an explicit leaderboard wouldn't be so bad....
The work is just that you have to create a model/[llm].py file. I suppose for the case of huggingface models this should be trivial as long as they have the...
Okay yeah, this is bad. @jeffreywpli your task will not be included in tbench v2 unless these issues are corrected.
On the former case: we don't really require that solution.sh is *good*. Just that it solves the task and passes the tests. If you think that the solution.sh is bad...
I used to write javascript games when I was 14 too! I hope you have fun with this. Please do let me know if you have other questions about the...
Fixing the texture to the face of the object is actually hard. The code is not set up in a way to make this possible. The problem is here: the...
The reference solution for problem 73 is I believe actually incorrect. The question asks: "Given an array arr of integers, find the minimum number of elements that\n need to be...