DiffCSE
DiffCSE copied to clipboard
Question about scalability wrt. input length
Howdy,
I was wondering if any experiments were done with the DiffCSE framework for long inputs (300-500 tokens), or ie. are there any conditions on the training data necessary for convergence?
PS - congrats on the paper, it was a really fun read :)