Márton Kardos
Márton Kardos
For something this complex that other people might want to use this is criminally underdocumented. It would be nice if I didn't have to go through the entire code to...
Working on #434. I will still have to add a good test task, if anyone has one don't hesitate to comment.
I'm gonna practice drums for the rest of the day and probably won't work tomorrow, but for those who are looking to contribute and get some of those juicy points...
For certain datasets , such as [EURLEX](https://huggingface.co/datasets/coastalcph/multi_eurlex) it would be very useful to have a multilabel classification task in the benchmark.
Added a naive implementation of hierarchical clustering along with a hierarchical formulation of SNL and results. If you have any thoughts on how to improve this feel free to chime...
There are a number of tasks in the benchmark that should be converted to ClusteringFast in order to gain a speedup. Additionally a lot of tasks that are currently formulated...
## Checklist for adding MMTEB dataset Reason for dataset addition: Converted both PLSC tasks (S2S, P2P) to hierarchical clustering. #702 - [x] I have tested that the dataset runs with...
## Checklist for adding MMTEB dataset Reason for dataset addition: Changed ArXiv clustering tasks (S2S and P2P) to hierarchical (two levels separated by dots). #696 - [x] I have tested...
PLSC is naturally formulated as a hierarchical clustering problem, but the current implementation is flat.
## Checklist for adding MMTEB dataset Reason for dataset addition: VG Clustering was not hierarchical nor ClusteringFast before #656 - [x] I have tested that the dataset runs with the...