Versioning for AbsTasks
Currently, AbsTasks only differ by name. Whenever we want to update an abstask there are two things we have to make sure:
- Most people and new tasks use the new version.
- Results on older tasks should preferably not change, especially after we start running the models.
Our solution for this with clustering was to introduce a completely new task, and then create an issue for converting all tasks to the new version. Currently, there is a similar problem with PairClassification #756. Again the new version is no longer backwards compatible and would produce different results from the previous one. @KennethEnevoldsen brought it up that we could have a test to see whether someone was using an old version in a new task, this would have to be hard-coded though, which is not too nice in my view. @Muennighoff also suggested that we rename the outdated abstask, but then we need to rewrite every old file in order to adhere to this principle every time we update an AbsTask.
I have multiple suggestions for what we could do here:
a) We could do the thing we already do with concrete tasks, and go down the superseeded_by road.
b) We could introduce some versioning system for abstasks.
It has been suggested that we could add V1, V2, etc. to task names. I'm not sure, I like this because then we would have to rename every currently existing abstask.
It would be really nice to have a version attribute on AbsTasks, and then an abstask_version attribute on particular tasks, this would however, be incredibly hacky to do with the current way we manage these things (inheritence).
Do you guys have any good ideas on how to solve this? I'm currently trying to write a new AbsTask for PairClassification, and it would be really nice if we could figure some system out.
I would just add a failing test stating "Please don't use AbstaskClustering but instead use AbstaskClusteringFast" with an exception list of all old tasks of the type.
@isaac-chung would love a second opinion here
Another option (which have been used for SummarizationEvaluator) is to use:
DeprecatedSummarizationEvaluator (this also comes with a deprecation warning) and SummarizationEvaluator - this are maintained to ensure backward compatibility but make it clear that you would not use it.
The warning might be too much for AbstaskClustering, but we could still do:
AbstaskClustering -> DeprecatedAbstaskClustering AbstaskClusteringFast -> AbstaskClustering
Alternative we could do: AbstaskClustering -> AbstaskClusteringV1 AbstaskClusteringFast -> AbstaskClusteringV2
It might also be reasonable to not do anything and leave it as is (https://github.com/embeddings-benchmark/mteb/pull/756 is closed so it isn't pressing)
AbstaskClustering -> DeprecatedAbstaskClustering AbstaskClusteringFast -> AbstaskClustering
Either the DeprecatedAbsTask or superseded_by route (leaning this a bit more) might be a cleaner solution for Clustering and Pair Classification IMO. My assumptions are that
- Updating AbsTasks likely won't be a frequent thing in the future.
- We are only dealing with 1 update for each task type
- Previous Tasks will still be runnable if needed
For PairClassification, since the current tasks are binary and "new" tasks would have 3 classes, maybe we could use PairClassificationV2? That way we can avoid renaming everything, and still allow users to run "old" tasks.
1), 2), 3) completely agree.
For PairClassification, since the current tasks are binary and "new" tasks would have 3 classes, maybe we could use PairClassificationV2? That way we can avoid renaming everything, and still allow users to run "old" tasks.
if we go this route we could also simply change: AbstaskClusteringFast -> AbstaskClusteringV2
There is also the option of leaving it as is for now (PairClassificationV2 isn't planned)
Oh right. Leaving as is works for me. I think we can use V2 + superseded by once there's a need for new Pair Classification tasks.
Perfect I am totally fine with this - will close this for now then