Tim Ridgely
Results
2
comments of
Tim Ridgely
> Both models answer correctly if asked to explain the answer, but both indeed fail if asked to answer in one number. Very interesting! Yeah, I found that until I...
@usama-openai Thanks for checking this! I've updated the dataset to include only 385 samples, it still has 0.010 accuracy against 3.5-turbo. I also added a generator script for it.