MolBERT icon indicating copy to clipboard operation
MolBERT copied to clipboard

Dataset size and creation

Open LivC193 opened this issue 3 years ago • 3 comments

Hi, first of all congrats on your article and the NeurIPS workshop.

I have a few questions:

  1. Regarding fine-tuning: do you update the pre-trained encoder or do you freeze it ?
  2. You say that any molecule with a ECFP4 similarity higher than 0.323 to 10 drugs was discarded. I assume this was done for generalisation. However what type of similarity did you use (Tanimoto, Dice etc) and why 0.323 ? Also have you performed any clustering based on similarity for the final dataset to ensure that the parsed chemical space is balanced ?

LivC193 avatar Dec 13 '20 20:12 LivC193