TopClus icon indicating copy to clipboard operation
TopClus copied to clipboard

Request for using TopClus on different pretrained language models

Open RobertoCorti opened this issue 1 year ago • 1 comments

Hi,

I've read your paper and I like this approach. Thank you for sharing the code. I've one question regarding the pretrained language models (PLMs) that you use for getting the contextualized word representations. I saw in the source code that the model you use is fixed, and it's the classical 'bert-base-uncased':

https://github.com/yumeng5/TopClus/blob/01e22fb73262bc45d361ec9165bdadbd929ac9a5/src/trainer.py#L22

Suppose I'm interested on using this method on a corpus of italian texts. In that case, would it be possible to change this model and use a bert-base-multilingual-uncased instead?

If that's possible, can we make pretrained_lm a parameter of the TopClusTrainer?

Thank you.

RobertoCorti avatar Mar 29 '23 08:03 RobertoCorti