Nicholas Carlini comments

Results 84 comments of


                                            Nicholas Carlini

Easy mate -- ignores direct attack with Queen?

This is an expected behavior due to how I implement the 2-ply search. It's more of 1.75-ply search: the second ply is evaluated as part of the "is my current...

Benchmark for some open source models

Ah very nice. I should write some code that will merge together multiple independent datasets to make a larger matrix... I guess we don't know what Mistral Medium is, but...

How to add Azure api key and endpoint and how to access gpt models based on that.

You'll need to write an interface that connects this project up to Azure. You can see this PR for someone who added a new endpoint (#17), or reference some of...

Would you want to make a leaderboard for this?

Yeah maybe this isn't a bad idea... I did update my initial blog post with claude-3 and mistral large (https://nicholas.carlini.com/writing/2024/my-benchmark-for-large-language-models.html) But maybe having an explicit leaderboard wouldn't be so bad....

Would you want to make a leaderboard for this?

The work is just that you have to create a model/[llm].py file. I suppose for the case of huggingface models this should be trivial as long as they have the...

super-benchmark-upet cheatable test & oracle solution cheats

Okay yeah, this is bad. @jeffreywpli your task will not be included in tbench v2 unless these issues are corrected.

Task Issue: broken-networking

On the former case: we don't really require that solution.sh is *good*. Just that it solves the task and passes the tests. If you think that the solution.sh is bad...

Changing game textures.

I used to write javascript games when I was 14 too! I hope you have fun with this. Please do let me know if you have other questions about the...

Changing game textures.

Fixing the texture to the face of the object is actually hard. The code is not set up in a way to make this possible. The problem is here: the...

Potential test case bugs in the difficult subset

The reference solution for problem 73 is I believe actually incorrect. The question asks: "Given an array arr of integers, find the minimum number of elements that\n need to be...