Márton Kardos
Márton Kardos
@dokato Let's keep this in a separate PR so we can discuss the two things on different threads (this task and then stratified subsampling). It should also be fine considering...
For this a `ClassifierChain` seems more appropriate as the labels are clearly dependent on each other. Chime in on the discussion at #440. I'm thinking of adding multiple options to...
I can shoot a bullet but catch me if I fall @KennethEnevoldsen.
@dokato Try formulating it as a retrieval task instead :))
@wissam-sib Please verify that no one has added them yet or is working on a PR, otherwise feel free to go ahead :D
We could also generate clustering datasets from Wikipedia in scarce resource languages by traversing the category hierarchy. I've done this before and have some code lying around if you're interested.
#### Option 1: Training a classifier over actual features: I think one way we could make these tasks a bit more nuanced and natural is by using logistic regression over:...
@KennethEnevoldsen suggested to go with Option 1, and I think it's a good idea. @Muennighoff green light?
I agree with Kenneth that neutral is a thrid thing entirely. I think even if we choose to go with something distance-based we could potentially account for this with Option...
hmm, I guess we should write a new abstask then and then add new versions of tasks? Btw would it not make sense to have some sort of versioning on...