setfit icon indicating copy to clipboard operation
setfit copied to clipboard

Methodological error in zero cost, zero time, zero shot notebook

Open stephantul opened this issue 1 year ago • 3 comments

Hi,

I was looking at the zero cost, zero time, zero shot notebook for financial sentiment analysis (i.e., this one), and discovered a methodological error that invalidates the conclusions of the distillation section.

What happens is that the train and test dataframes, i.e., the CSV files loaded from Moritz Laurer's blog, are created by splitting the train split of the dataset (the dataset doesn't have a test split). Later on, when distilling, the authors of blog post reload the entire train split of the dataset, and then use this to distill the MLP. This means that the test data is also used to distill the model, which leads to a big overestimation of performance.

In my experiments, the original score PRF score I got was:

(array([0.85507246, 0.97348485, 0.94166667]),
 array([0.96721311, 0.96981132, 0.88976378]),
 array([0.90769231, 0.97164461, 0.91497976]),
 array([ 61, 265, 127]))

Which is close to the reported score in the article. If I instead remove the test data from the data used to distill the MLP, I get much lower scores:

(array([0.76785714, 0.87632509, 0.78947368]),
 array([0.70491803, 0.93584906, 0.70866142]),
 array([0.73504274, 0.90510949, 0.74688797]),
 array([ 61, 265, 127]))

These scores are much lower than the reported scores, and also much lower than the LLM scores, which invalidates the conclusion of the notebook and article. Note that these scores are still a bit higher than the scores you would get when just directly optimizing cross entropy, so you could argue that the point still makes sense.

If you want I can do a PR on the notebook.

stephantul avatar Apr 20 '24 13:04 stephantul

@MosheWasserb

tomaarsen avatar Apr 20 '24 18:04 tomaarsen

Hi @tomaarsen, Sorry miss your message :( Great catch. Yes, go ahead and issue a PR

MosheWasserb avatar May 28 '24 07:05 MosheWasserb

Hey @MosheWasserb ,

Thanks for replying, really appreciated.

Before I submit a PR, could we maybe discuss what you want the final conclusion of the article to look like? Because the part after you reload the dataset doesn't work any more. Should I just remove those parts?

stephantul avatar Jun 13 '24 18:06 stephantul