CALM icon indicating copy to clipboard operation
CALM copied to clipboard

Impact of using widely referenced open source data sets

Open rupnic opened this issue 6 months ago • 3 comments

New to the field and might be completely off the mark here - but was any consideration given to the fact that because the data sets used are fairly widely referenced and repeated that they might have formed part of the original foundational training data for the models and this might have boosted model performance vs. using a novel dataset?

rupnic avatar Aug 15 '24 19:08 rupnic