gritlm icon indicating copy to clipboard operation
gritlm copied to clipboard

For document clustering, should we leave instruction blank?

Open griff4692 opened this issue 1 year ago • 3 comments
trafficstars

Thanks -- I am using Grit for document embeddings that will be used to score doc-to-doc similarity.

Should I add an instruction or leave it blank?

Thank you, Griffin

griff4692 avatar Feb 23 '24 23:02 griff4692

Thanks -- I am using Grit for document embeddings that will be used to score doc-to-doc similarity.

Should I add an instruction or leave it blank?

Thank you, Griffin

So is it like STS rather than Retrieval? I would probably add them in that case, but it may make sense to try both.

Muennighoff avatar Feb 24 '24 07:02 Muennighoff

Thanks -- I am using Grit for document embeddings that will be used to score doc-to-doc similarity. Should I add an instruction or leave it blank? Thank you, Griffin

So is it like STS rather than Retrieval? I would probably add them in that case, but it may make sense to try both.

Thanks for the reply! Yes - in order to cluster documents for in-context pre-training (https://arxiv.org/abs/2310.10638).

Was going to try "Identify the main topics from a medical document." but wasn't sure how instructions for embeddings are meant to be worded for gritlm.

griff4692 avatar Feb 24 '24 13:02 griff4692

Thanks -- I am using Grit for document embeddings that will be used to score doc-to-doc similarity. Should I add an instruction or leave it blank? Thank you, Griffin

So is it like STS rather than Retrieval? I would probably add them in that case, but it may make sense to try both.

Thanks for the reply! Yes - in order to cluster documents for in-context pre-training (https://arxiv.org/abs/2310.10638).

Was going to try "Identify the main topics from a medical document." but wasn't sure how instructions for embeddings are meant to be worded for gritlm.

Yeah I think for clustering you'll get slightly better performance if you include an instruction. The one you proposed sounds good to me!

Muennighoff avatar Feb 24 '24 14:02 Muennighoff